简体   繁体   English

从给定文本中提取电子邮件

[英]Extract emails from a given text

I am trying to extract a list of emails from a given text.我正在尝试从给定文本中提取电子邮件列表。 Most of emails has the following syntax:大多数电子邮件具有以下语法:

 "Last_name, First_Name (First-name)" <last_name.first_name@domain.xxx>
or
"Last_name, First_Name (XXXX)" <last_name.first_name@domain.xxx>

My goal is to extract the whole emails including the first part, meaning the "Last_name, First_Name (XXXX)".我的目标是提取包括第一部分在内的整个电子邮件,意思是“姓氏,名字(XXXX)”。

To extract the list of emails, I have used the following regex:为了提取电子邮件列表,我使用了以下正则表达式:

"(<?[a-z0-9!#$%&*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
"{|}~-]+)*(@|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|"
"\sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9]>?)?)"

which extract only the emails without the first part.只提取没有第一部分的电子邮件。 Meaning that extract only the:意思是只提取:

<last_name.first_name@domain.xxx>

I have tried several variations of the regex to extract the first part but unfortunately they doesn't work.我尝试了正则表达式的几种变体来提取第一部分,但不幸的是它们不起作用。

Please do not hesitate If you have any suggestion.如果您有任何建议,请不要犹豫。 Thank you in advance.提前谢谢你。

First, check that link where you can test your regex with a nice memo around it首先,检查该链接,您可以在其中使用一个很好的备忘录来测试您的正则表达式

https://regex101.com https://regex101.com

Then, something like然后,像

"[a-zA-Z_]+, [a-zA-Z_( )]+" "[a-zA-Z_]+, [a-zA-Z_( )]+"

Should capture the first Part, maybe be you can give us some more testing text ?应该捕获第一部分,也许您可​​以给我们一些更多的测试文本?

 >>> import re
 >>>
 >>> emailLine='"Last_name, First_Name (First-name)" <last_name.first_name@domain.xxx>'
 >>>
 >>> re.findall('^\"([^,]*?),\s([^"]*?)"\s<([^>]*?)>',emailLine)

 [('Last_name', 'First_Name (First-name)', 'last_name.first_name@domain.xxx')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM