[英]Regex for greping emails in file
I would like to validate emails from text files in a directory using bash
. 我想使用
bash
验证目录中文本文件中的电子邮件。
My regex: 我的正则表达式:
grep -Eoh \
"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,8}\b" * \
| sort -u > mail_list
This regex satisfies all my requirements but it cannot exclude addresses such: 此正则表达式可以满足我的所有要求,但不能排除以下地址:
^%&blah@gmail.com
and 和
with.dot@sale..department.company-name.com
(with 2 and more dots). (带有2个或更多点)。
These kinds of addresses should be excluded. 这类地址应排除在外。
How can I modify this regex to exclude these types of emails? 如何修改此正则表达式以排除这些类型的电子邮件?
I can use only one expression for this task. 对于此任务,我只能使用一个表达式。
The email address ^%&blah@gmail.com
is actually a valid email address . 电子邮件地址
^%&blah@gmail.com
实际上是有效的电子邮件地址 。
You can do this in Perl using the Email::Valid
module (this assumes that each entry is on a new line): 您可以在Perl中使用
Email::Valid
模块执行此操作(假定每个条目都在新行上):
perl -MEmail::Valid -ne 'print if Email::Valid->address($_)' file1 file2
not email
abc@test.com
not email
def@test.com
^%&blah@gmail.com
with.dot@sale..department.company-name.com
abc@test.com
def@test.com
^%&blah@gmail.com
Try this regex: 试试这个正则表达式:
'\b[A-Za-z0-9]+[A-Za-z0-9._%+-]+@([A-Za-z0-9-]+\.)+[A-Za-z]{2,8}\b'
I added an alphanumeric group to the front, to force emails to begin with at least one letter or number, after which they may also have symbols. 我在前面添加了一个字母数字组,以强制电子邮件以至少一个字母或数字开头,之后它们还可能带有符号。
After the @
sign, I added a group which can contain any number of letters or numbers, followed by one period. @
符号后,我添加了一个组,该组可以包含任意数量的字母或数字,后跟一个句点。 However, this group can be repeated multiple times, thus being able to match long.domain.name.com
. 但是,该组可以重复多次,因此可以匹配
long.domain.name.com
。
Finally, the regex ends with the final string as you had it, for example 'com'
. 最后,正则表达式以您拥有的最终字符串结尾,例如
'com'
。
Since \\b
matches a word boundary, and the symbols ^%&
are not considered part of the word 'blah', the above will still match blah@gmail.com
even though it is preceded by undesired characters. 由于
\\b
匹配单词边界,并且符号^%&
不被视为单词'blah'的一部分,即使上面的blah@gmail.com
不需要的字符,它们仍然匹配。 To avoid this, use a Negative Lookbehind . 为避免这种情况,请使用Negative Lookbehind 。 This will require using
grep -P
instead of -E
: 这将需要使用
grep -P
而不是-E
:
grep -P '(?<![%&^])\b[A-Za-z0-9]+[A-Za-z0-9._%+-]+@([A-Za-z0-9-]+\.)+[A-Za-z]{2,8}\b'
The (?<![%&^])
tells regex to match further only if the string is not preceded by the characters %&^
. (?<![%&^])
告诉正则表达式仅在字符串前面没有字符%&^
时才进行进一步匹配。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.