简体   繁体   English

使用 grep 查找所有电子邮件

[英]Using grep to find all emails

How to properly construct regular expression for "grep" linux program, to find all email in, say /etc directory?如何正确构造“grep”linux 程序的正则表达式,以找到所有 email,例如 /etc 目录? Currently, my script is following:目前,我的脚本如下:

grep -srhw "[[:alnum:]]*@[[:alnum:]]*" /etc

It working OK - a see some of the emails, but when i modify it, to catch the one-or-more charactes before- and after the "@" sign...它工作正常 - 看到一些电子邮件,但是当我修改它时,在“@”符号之前和之后捕获一个或多个字符......

grep -srhw "[[:alnum:]]+@[[:alnum:]]+" /etc

.. it stops working at all ..它完全停止工作

Also, it does't catches emails of form "Name.LastName@site.com"此外,它不会捕获格式为“Name.LastName@site.com”的电子邮件

Help !帮助 !

Here is another example这是另一个例子

grep -Eiorh '([[:alnum:]_.-]+@[[:alnum:]_.-]+?\.[[:alpha:].]{2,6})' "$@" * | sort | uniq > emails.txt

This variant works with 3 level domains.此变体适用于 3 级域。

grep requires most of the regular expression special characters to be escaped - including + . grep需要对大多数正则表达式特殊字符进行转义 - 包括+ You'll want to do one of these two:您需要执行以下两项操作之一:

grep -srhw "[[:alnum:]]\+@[[:alnum:]]\+" /etc

egrep -srhw "[[:alnum:]]+@[[:alnum:]]+" /etc

I modified your regex to include punctuation (like .-_ etc) by changing it to我修改了您的正则表达式以包含标点符号(如 .-_ 等),将其更改为

egrep -ho "[[:graph:]]+@[[:graph:]]+"

This still is pretty clean and matches... well, most anything with an @ in it, of course.这仍然非常干净并且匹配......当然,大多数带有@的东西。 Also 3rd level domains, also addresses with '%' or '+' in them.同样是 3rd 级别的域,其中也包含带有 '%' 或 '+' 的地址。 See http://www.delorie.com/gnu/docs/grep/grep_8.html for a good documentation on the character class used.有关所用字符类的良好文档,请参阅http://www.delorie.com/gnu/docs/grep/grep_8.html

In my example, the addresses were surrounded by white space, making matching quite easy.在我的示例中,地址被空格包围,使匹配变得非常容易。 If you grep through a mail server log for example, you can add < > to make it match only the addresses:例如,如果您通过邮件服务器日志 grep,您可以添加 < > 以使其仅匹配地址:

egrep -ho "<[[:graph:]]+@[[:graph:]]+>"

@thomas, @glowcoder and @oedo all are right. @thomas、@glowcoder 和 @oedo 都是对的。 The RFC that defines how an eMail address can look is quite a fun read.定义电子邮件地址外观的 RFC 非常有趣。 (I've been using GNU grep 2.9 above, included in Ubuntu). (我一直在使用上面的 GNU grep 2.9,包含在 Ubuntu 中)。

Also check out zpea's version below, it should make for a less trigger-happy matcher.还可以查看下面的 zpea 版本,它应该是一个不太容易触发的匹配器。

I have used this one in order to filter email address identified by 'at' symbol and isolated by white spaces within a text:我使用这个是为了过滤由“at”符号标识并由文本中的空格分隔的电子邮件地址:

egrep -o "[^[:space:]]+@[^[:space:]]+" | tr -d "<>"

Of course, you can use grep -E instead egrep (extended grep).当然,您可以使用grep -E代替egrep (扩展 grep)。 Note that tr command is used to remove typical email delimiters.请注意, tr命令用于删除典型的电子邮件分隔符。

grep -E -o -r "[A-Za-z0-9][A-Za-z0-9._%+-]+@[A-Za-z0-9][A-Za-z0-9.-]+\\.[A-Za-z]{2,6}" /etc

This is adapted from an answer that is not mine originally, but I found it super helpful.这是改编自一个最初不是我的答案,但我发现它非常有帮助。 It's from here:它来自这里:

http://www.shellhacks.com/en/RegEx-Find-Email-Addresses-in-a-File-using-Grep http://www.shellhacks.com/en/RegEx-Find-Email-Addresses-in-a-File-using-Grep

They suggest:他们建议:

grep -E -o -r "\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,6}\\b" /etc

But it has certain false positives, like '+person..@example.com' or 'person@..com', and the whitespace constraints miss things like "mailto:person@example.com" (not technically an email but contains one);但它有某些误报,例如“+person..@example.com”或“person@..com”,并且空白约束会遗漏诸如“mailto:person@example.com”之类的内容(技术上不是电子邮件,但包含一); so I tweaked it a little bit.所以我稍微调整了一下。

(Do what you want with the options to grep, I don't know them very well) (用grep的选项做你想做的事,我不太了解它们)

这个递归对我很有用:

grep -rIhEo "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" /etc/*

只是想提一下,这个稍微的变化非常适合从 Twitter 推文之类的内容中获取提及:

grep -Eiorh '(@[[:alnum:]_.-]+)' "$@" * | sort | uniq -c

似乎有效,但使用@获取文件名

egrep -osrwh "[[:alnum:]._%+-]+@[[:alnum:]]+\.[a-zA-Z]{2,6}" ~/.thunderbird/

I Bet There Are No Best Base Regex Exists Than This One我打赌没有比这个更好的基础正则表达式了

egrep -o "[a-zA-Z0-9\_\.\+\%\-]{1,}\@[a-zA-Z0-9\_\.\+\%\-]{1,}\.[a-zA-Z0-9\_\.\+\%\-]{1,}"

It Will Not Leave A Single Email From The Garbage But The Thing You Must Have To Do Is, Extract If Something Same As Email But Not Email, Like home_mobile@1x.png , Either It Needs Manual Lookup Or Make My Mentioned Regex More Specific Towards What You Want Add More Special Characters But There Are No Base Regex Exists Which Is Better Than This它不会从垃圾中留下一个 Email,但你必须要做的是,提取与 Email 相同但不是 Email 的内容,如home_mobile@1x.png ,要么需要手动查找,要么使我提到的正则表达式更具体您想添加更多特殊字符,但没有比这更好的基本正则表达式

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM