Home > python > extract e-mails from a file

extract e-mails from a file

Problem
You have a text file and you want to extract all the e-mail addresses from it. For research purposes, of course.

Solution

#!/usr/bin/env python3

import re
import sys

def extract_emails_from(fname):
    with open(fname, errors='replace') as f:
        for line in f:
            match = re.findall(r'[\w\.-]+@[\w\.-]+', line)
            for e in match:
                if '?' not in e:
                    print(e)
                    
def main():
    fname = sys.argv[1]
    extract_emails_from(fname)

##############################################################################

if __name__ == "__main__":
    if len(sys.argv) == 1:
        print("Error: provide a text file!", file=sys.stderr)
        exit(1)
    # else
    main()

I had character encoding problems with some lines where the original program died with an exception. Using “open(fname, errors='replace')” will replace problematic characters with a “?“, hence the extra check before printing an e-mail to the screen.

The core of the script is the regex to find e-mails. That tip is from here.

Advertisements
Categories: python Tags: , ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: