简体   繁体   English

使用正则表达式验证带有点的电子邮件

[英]validate email with points using regex

I want to validate more than 40k emails from a csv file, the problem is that in this file there are some emails with blank spaces or it has only this value <blank> .我想从一个 csv 文件验证超过 40k 封电子邮件,问题是在这个文件中有一些带有空格的电子邮件或者它只有这个值<blank> I remove many rows from my dataframe using df.dropna() but yet there are rows with blank spaces.我使用df.dropna()从我的数据df.dropna()删除了许多行,但仍有带有空格的行。 Now I want validate this emails using a regular expression or regex with python and re lib.现在我想使用正则表达式或带有 python 和re lib 的正则表达式来验证这些电子邮件。

Here my code:这是我的代码:

import re

series = pd.Series(['test.123@gmail.com',
                    'two.dots.m12@gmail.com',
                    'test.test2.c@gmail.com.es',
                    'sam_alc12@congreso.gob.pe',
                    'hellowolrd.com',
                    '<blank>'])

regex = '^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$'
for email in series:
   if re.search(regex, email):
      print("{}: Valid Email".format(email))
   else:
      print("{} : Invalid Email".format(email))

This was the output:这是输出:

test.123@gmail.com: Valid Email
two.dots.m12@gmail.com : Invalid Email
test.test2.c@gmail.com.es : Invalid Email
sam_alc12@congreso.gob.pe : Invalid Email
hellowolrd.com : Invalid Email
<blank> : Invalid Email

However the were 3 incorrect validations with this emails:但是,这封电子邮件有 3 个不正确的验证:

two.dots.m12@gmail.com
test.test2.c@gmail.com.es
sam_alc12@congreso.gob.pe

All them are valid emails.. the current regex can't valida one email with more than 2 dots before of @ and after of @.所有这些都是有效的电子邮件..当前的正则表达式无法验证@之前和@之后超过2个点的电子邮件。
I tryed many mods in the current regex but nothing happened.我在当前的正则表达式中尝试了许多 mod,但什么也没发生。 I also used email-validator but it takes a lot of time because is verifying that it is a real email.我也使用了email-validator但它需要很多时间,因为要验证它是真实的电子邮件。

For your given examples, the issue is that you are only matching a single time an optional .对于您给定的示例,问题在于您只匹配一次可选的. or __

Instead, you can optionally repeat matching either one of them to match it multiple times, but not match consecutive .. or ___相反,您可以选择重复匹配其中之一以多次匹配,但不匹配连续的..___

You don't have to escape the \\.你不必逃避\\. in the character class, and the [@] does not have to be in square brackets.在字符类中, [@]不必在方括号中。

^[a-z0-9]+(?:[._][a-z0-9]+)*@(?:\w+\.)+\w{2,3}$
  • ^ Start of string ^字符串开始
  • [a-z0-9]+ Match 1+ times any of the listed [a-z0-9]+匹配 1+ 次列出的任何一个
  • (?:[._][a-z0-9]+)* Optionally repeat matching either . (?:[._][a-z0-9]+)*可选择重复匹配. or _ and 1+ one of the listed_和 1+ 所列之一
  • @ Match literally @逐字匹配
  • (?:\\w+\\.)+ Repeat 1+ times matching 1+ word chars and . (?:\\w+\\.)+重复 1+ 次匹配 1+ 个单词字符和.
  • \\w{2,3} match 2-3 word chars \\w{2,3}匹配 2-3 个单词字符
  • $ End of string $字符串结尾

Regex demo正则表达式演示

Note that this pattern accepts a limited set of email addresses allowing only to match \\w请注意,此模式接受一组有限的电子邮件地址,只允许匹配\\w

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM