简体   繁体   English

使用python和regex查找文本电子邮件

[英]find emails in text with python and regex

I am trying to extract emails from text. 我正在尝试从文本中提取电子邮件。 I used re.search , which returned the 1. occurrence, but then I went on and used re.findall . 我使用了re.search ,它返回了1.出现的位置,但是我继续使用了re.findall To my surprise re.findall finds less emails than re.search . 让我惊讶的是, re.findall发现的电子邮件少于re.search What could be the problem? 可能是什么问题呢?

Code: 码:

searchObj = re.search( r'[A-Za-z0-9\._+-]+@[A-Za-z0-9]+(\.|-)[A-Za-z0-9\.-]+', text)
        if searchObj:
            mail = searchObj.group()
            if mail not in emails:
                emails.add(mail)

listEmails = re.findall( r'[A-Za-z0-9\._+-]+@[A-Za-z0-9]+(\.|-)[A-Za-z0-9\.-]+', text)
        for mail in listEmails:
            if mail not in emails:
                emails.add(mail)

Replace the capturing group (\\.|-) with a non-capturing one or even with a character class: 用非捕获字符组或什至用字符类替换捕获组(\\.|-)

r'[A-Za-z0-9._+-]+@[A-Za-z0-9]+[.-][A-Za-z0-9.-]+'
                               ^^^^ 

Or even shorter: 甚至更短:

r'[\w.+-]+@[^\W_]+[.-][A-Za-z0-9.-]+'

Else, re.findall will only return the list of captured values. 否则, re.findall将仅返回捕获值的列表。

Python demo : Python演示

import re
rx = r'[\w.+-]+@[^\W_]+[.-][A-Za-z0-9.-]+'
s = 'some@mail.com and more email@somemore-here.com'
print(re.findall(rx, s))
# => ['some@mail.com', 'email@somemore-here.com']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM