简体   繁体   中英

validate email with points using regex

I want to validate more than 40k emails from a csv file, the problem is that in this file there are some emails with blank spaces or it has only this value <blank> . I remove many rows from my dataframe using df.dropna() but yet there are rows with blank spaces. Now I want validate this emails using a regular expression or regex with python and re lib.

Here my code:

import re

series = pd.Series(['test.123@gmail.com',
                    'two.dots.m12@gmail.com',
                    'test.test2.c@gmail.com.es',
                    'sam_alc12@congreso.gob.pe',
                    'hellowolrd.com',
                    '<blank>'])

regex = '^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$'
for email in series:
   if re.search(regex, email):
      print("{}: Valid Email".format(email))
   else:
      print("{} : Invalid Email".format(email))

This was the output:

test.123@gmail.com: Valid Email
two.dots.m12@gmail.com : Invalid Email
test.test2.c@gmail.com.es : Invalid Email
sam_alc12@congreso.gob.pe : Invalid Email
hellowolrd.com : Invalid Email
<blank> : Invalid Email

However the were 3 incorrect validations with this emails:

two.dots.m12@gmail.com
test.test2.c@gmail.com.es
sam_alc12@congreso.gob.pe

All them are valid emails.. the current regex can't valida one email with more than 2 dots before of @ and after of @.
I tryed many mods in the current regex but nothing happened. I also used email-validator but it takes a lot of time because is verifying that it is a real email.

For your given examples, the issue is that you are only matching a single time an optional . or _

Instead, you can optionally repeat matching either one of them to match it multiple times, but not match consecutive .. or ___

You don't have to escape the \\. in the character class, and the [@] does not have to be in square brackets.

^[a-z0-9]+(?:[._][a-z0-9]+)*@(?:\w+\.)+\w{2,3}$
  • ^ Start of string
  • [a-z0-9]+ Match 1+ times any of the listed
  • (?:[._][a-z0-9]+)* Optionally repeat matching either . or _ and 1+ one of the listed
  • @ Match literally
  • (?:\\w+\\.)+ Repeat 1+ times matching 1+ word chars and .
  • \\w{2,3} match 2-3 word chars
  • $ End of string

Regex demo

Note that this pattern accepts a limited set of email addresses allowing only to match \\w

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM