简体   繁体   中英

Extract email addresses from text file using regex with bash or command line

How can I grep out only the email address using a regex from a file with multiple lines similar to this. (a sql dump to be precise)

Unfortunately I cannot just go back and dump the email column at this point.

Example data:

62372,35896,1,cgreen,Chad,Green,cgreen@blah.com,123456789,0,,,,,,,,,3,Blah,,2013-05-02 17:42:31.659574,164842,,0,0

I have tried this but it did not work:

grep -o '[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}' file.csv

If you still want to go the grep -o route, this one works for me:

$ grep -i -o '[A-Z0-9._%+-]\+@[A-Z0-9.-]\+\.[A-Z]\{2,4\}' file.csv
cgreen@blah.com
$ 

I appear to have 2 versions of grep in my path, 2.4.2 and 2.5.1. Only 2.5.1 appears to support the -o option.

Your regular expression is close, but you're missing 2 things:

  • regular expressions are case sensitive. So you can either pass -i to grep or add extra az to your square bracket expressions
  • The + modifiers and {} curly braces appear to need to be escaped.

If you know the field position then it is much easier with awk or cut:

awk -F ',' '{print $7}' file

OR

cut -d ',' -f7 file

The best way to handle this is with a proper CSV parser. A simple way to accomplish that, if it's a one-time task, is to load the CSV file into your favorite spreadsheet software, then extract just the email field.

It is difficult to parse CSV with a regex, because of the possibility of escaped commas, quoted text, etc.

Consider, the following are valid email addresses, according to Internet standards:

  • foo,bar@gmail.com
  • foo"bar@gmail.com

If you know for a fact that you will never have this sort of data, then perhaps simple grep and awk tools will work (as in @anubhava's answer).

You can solve it using with the help of the built-in csv module and the external validators module, like this:

import validators
import csv
import sys

with open(sys.argv[1], newline='') as csvfile:
    csvreader = csv.reader(csvfile)
    for row in csvreader:
        for field in row:
            if validators.email(field):
                print(field)

Run it like:

python3 script.py infile

That yields:

cgreen@blah.com

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM