find emails in text with python and regex

Question

I am trying to extract emails from text. I used re.search , which returned the 1. occurrence, but then I went on and used re.findall . To my surprise re.findall finds less emails than re.search . What could be the problem?

Code:

searchObj = re.search( r'[A-Za-z0-9\._+-]+@[A-Za-z0-9]+(\.|-)[A-Za-z0-9\.-]+', text)
        if searchObj:
            mail = searchObj.group()
            if mail not in emails:
                emails.add(mail)

listEmails = re.findall( r'[A-Za-z0-9\._+-]+@[A-Za-z0-9]+(\.|-)[A-Za-z0-9\.-]+', text)
        for mail in listEmails:
            if mail not in emails:
                emails.add(mail)

Answer 1

Replace the capturing group (\\.|-) with a non-capturing one or even with a character class:

r'[A-Za-z0-9._+-]+@[A-Za-z0-9]+[.-][A-Za-z0-9.-]+'
                               ^^^^

Or even shorter:

r'[\w.+-]+@[^\W_]+[.-][A-Za-z0-9.-]+'

Else, re.findall will only return the list of captured values.

Python demo :

import re
rx = r'[\w.+-]+@[^\W_]+[.-][A-Za-z0-9.-]+'
s = 'some@mail.com and more email@somemore-here.com'
print(re.findall(rx, s))
# => ['some@mail.com', 'email@somemore-here.com']

find emails in text with python and regex

Question

1 answers

solution1
3 2016-12-28 09:04:07

find emails in text with python and regex

Question

1 answers

solution1 3 2016-12-28 09:04:07

solution1
3 2016-12-28 09:04:07