I am trying to extract emails from text. I used re.search
, which returned the 1. occurrence, but then I went on and used re.findall
. To my surprise re.findall
finds less emails than re.search
. What could be the problem?
Code:
searchObj = re.search( r'[A-Za-z0-9\._+-]+@[A-Za-z0-9]+(\.|-)[A-Za-z0-9\.-]+', text)
if searchObj:
mail = searchObj.group()
if mail not in emails:
emails.add(mail)
listEmails = re.findall( r'[A-Za-z0-9\._+-]+@[A-Za-z0-9]+(\.|-)[A-Za-z0-9\.-]+', text)
for mail in listEmails:
if mail not in emails:
emails.add(mail)
Replace the capturing group (\\.|-)
with a non-capturing one or even with a character class:
r'[A-Za-z0-9._+-]+@[A-Za-z0-9]+[.-][A-Za-z0-9.-]+'
^^^^
Or even shorter:
r'[\w.+-]+@[^\W_]+[.-][A-Za-z0-9.-]+'
Else, re.findall
will only return the list of captured values.
import re
rx = r'[\w.+-]+@[^\W_]+[.-][A-Za-z0-9.-]+'
s = 'some@mail.com and more email@somemore-here.com'
print(re.findall(rx, s))
# => ['some@mail.com', 'email@somemore-here.com']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.