Archive

Archive for October, 2017

extract e-mails from a file

October 10, 2017 Leave a comment

Problem
You have a text file and you want to extract all the e-mail addresses from it. For research purposes, of course.

Solution

#!/usr/bin/env python3

import re
import sys

def extract_emails_from(fname):
    with open(fname, errors='replace') as f:
        for line in f:
            match = re.findall(r'[\w\.-]+@[\w\.-]+', line)
            for e in match:
                if '?' not in e:
                    print(e)
                    
def main():
    fname = sys.argv[1]
    extract_emails_from(fname)

##############################################################################

if __name__ == "__main__":
    if len(sys.argv) == 1:
        print("Error: provide a text file!", file=sys.stderr)
        exit(1)
    # else
    main()

I had character encoding problems with some lines where the original program died with an exception. Using “open(fname, errors='replace')” will replace problematic characters with a “?“, hence the extra check before printing an e-mail to the screen.

The core of the script is the regex to find e-mails. That tip is from here.

Advertisements
Categories: python Tags: , ,