I want to validate more than 40k emails from a csv file, the problem is that in this file there are some emails with blank spaces or it has only this value <blank>
. I remove many rows from my dataframe using df.dropna()
but yet there are rows with blank spaces. Now I want validate this emails using a regular expression or regex with python and re
lib.
Here my code:
import re
series = pd.Series(['test.123@gmail.com',
'two.dots.m12@gmail.com',
'test.test2.c@gmail.com.es',
'sam_alc12@congreso.gob.pe',
'hellowolrd.com',
'<blank>'])
regex = '^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$'
for email in series:
if re.search(regex, email):
print("{}: Valid Email".format(email))
else:
print("{} : Invalid Email".format(email))
This was the output:
test.123@gmail.com: Valid Email
two.dots.m12@gmail.com : Invalid Email
test.test2.c@gmail.com.es : Invalid Email
sam_alc12@congreso.gob.pe : Invalid Email
hellowolrd.com : Invalid Email
<blank> : Invalid Email
However the were 3 incorrect validations with this emails:
two.dots.m12@gmail.com
test.test2.c@gmail.com.es
sam_alc12@congreso.gob.pe
All them are valid emails.. the current regex can't valida one email with more than 2 dots before of @ and after of @.
I tryed many mods in the current regex but nothing happened. I also used email-validator
but it takes a lot of time because is verifying that it is a real email.
For your given examples, the issue is that you are only matching a single time an optional .
or _
Instead, you can optionally repeat matching either one of them to match it multiple times, but not match consecutive ..
or ___
You don't have to escape the \\.
in the character class, and the [@]
does not have to be in square brackets.
^[a-z0-9]+(?:[._][a-z0-9]+)*@(?:\w+\.)+\w{2,3}$
^
Start of string [a-z0-9]+
Match 1+ times any of the listed (?:[._][a-z0-9]+)*
Optionally repeat matching either .
or _
and 1+ one of the listed@
Match literally (?:\\w+\\.)+
Repeat 1+ times matching 1+ word chars and .
\\w{2,3}
match 2-3 word chars $
End of string Note that this pattern accepts a limited set of email addresses allowing only to match \\w
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.