简体   繁体   中英

Regex for greping emails in file

I would like to validate emails from text files in a directory using bash .

My regex:

grep -Eoh \
         "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,8}\b" * \
         | sort -u > mail_list

This regex satisfies all my requirements but it cannot exclude addresses such:

^%&blah@gmail.com

and

with.dot@sale..department.company-name.com

(with 2 and more dots).

These kinds of addresses should be excluded.

How can I modify this regex to exclude these types of emails?
I can use only one expression for this task.

The email address ^%&blah@gmail.com is actually a valid email address .

You can do this in Perl using the Email::Valid module (this assumes that each entry is on a new line):

perl -MEmail::Valid -ne 'print if Email::Valid->address($_)' file1 file2

file1

not email
abc@test.com

file2

not email
def@test.com
^%&blah@gmail.com
with.dot@sale..department.company-name.com

output

abc@test.com
def@test.com
^%&blah@gmail.com

Try this regex:

'\b[A-Za-z0-9]+[A-Za-z0-9._%+-]+@([A-Za-z0-9-]+\.)+[A-Za-z]{2,8}\b'

I added an alphanumeric group to the front, to force emails to begin with at least one letter or number, after which they may also have symbols.

After the @ sign, I added a group which can contain any number of letters or numbers, followed by one period. However, this group can be repeated multiple times, thus being able to match long.domain.name.com .

Finally, the regex ends with the final string as you had it, for example 'com' .


Update:

Since \\b matches a word boundary, and the symbols ^%& are not considered part of the word 'blah', the above will still match blah@gmail.com even though it is preceded by undesired characters. To avoid this, use a Negative Lookbehind . This will require using grep -P instead of -E :

grep -P '(?<![%&^])\b[A-Za-z0-9]+[A-Za-z0-9._%+-]+@([A-Za-z0-9-]+\.)+[A-Za-z]{2,8}\b'

The (?<![%&^]) tells regex to match further only if the string is not preceded by the characters %&^ .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM