[英]validate email with points using regex
I want to validate more than 40k emails from a csv file, the problem is that in this file there are some emails with blank spaces or it has only this value <blank>
.我想从一个 csv 文件验证超过 40k 封电子邮件,问题是在这个文件中有一些带有空格的电子邮件或者它只有这个值
<blank>
。 I remove many rows from my dataframe using df.dropna()
but yet there are rows with blank spaces.我使用
df.dropna()
从我的数据df.dropna()
删除了许多行,但仍有带有空格的行。 Now I want validate this emails using a regular expression or regex with python and re
lib.现在我想使用正则表达式或带有 python 和
re
lib 的正则表达式来验证这些电子邮件。
Here my code:这是我的代码:
import re
series = pd.Series(['test.123@gmail.com',
'two.dots.m12@gmail.com',
'test.test2.c@gmail.com.es',
'sam_alc12@congreso.gob.pe',
'hellowolrd.com',
'<blank>'])
regex = '^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$'
for email in series:
if re.search(regex, email):
print("{}: Valid Email".format(email))
else:
print("{} : Invalid Email".format(email))
This was the output:这是输出:
test.123@gmail.com: Valid Email
two.dots.m12@gmail.com : Invalid Email
test.test2.c@gmail.com.es : Invalid Email
sam_alc12@congreso.gob.pe : Invalid Email
hellowolrd.com : Invalid Email
<blank> : Invalid Email
However the were 3 incorrect validations with this emails:但是,这封电子邮件有 3 个不正确的验证:
two.dots.m12@gmail.com
test.test2.c@gmail.com.es
sam_alc12@congreso.gob.pe
All them are valid emails.. the current regex can't valida one email with more than 2 dots before of @ and after of @.所有这些都是有效的电子邮件..当前的正则表达式无法验证@之前和@之后超过2个点的电子邮件。
I tryed many mods in the current regex but nothing happened.我在当前的正则表达式中尝试了许多 mod,但什么也没发生。 I also used
email-validator
but it takes a lot of time because is verifying that it is a real email.我也使用了
email-validator
但它需要很多时间,因为要验证它是真实的电子邮件。
For your given examples, the issue is that you are only matching a single time an optional .
对于您给定的示例,问题在于您只匹配一次可选的
.
or _
或
_
Instead, you can optionally repeat matching either one of them to match it multiple times, but not match consecutive ..
or ___
相反,您可以选择重复匹配其中之一以多次匹配,但不匹配连续的
..
或___
You don't have to escape the \\.
你不必逃避
\\.
in the character class, and the [@]
does not have to be in square brackets.在字符类中,
[@]
不必在方括号中。
^[a-z0-9]+(?:[._][a-z0-9]+)*@(?:\w+\.)+\w{2,3}$
^
Start of string ^
字符串开始[a-z0-9]+
Match 1+ times any of the listed [a-z0-9]+
匹配 1+ 次列出的任何一个(?:[._][a-z0-9]+)*
Optionally repeat matching either .
(?:[._][a-z0-9]+)*
可选择重复匹配.
or _
and 1+ one of the listed_
和 1+ 所列之一@
Match literally @
逐字匹配(?:\\w+\\.)+
Repeat 1+ times matching 1+ word chars and .
(?:\\w+\\.)+
重复 1+ 次匹配 1+ 个单词字符和.
\\w{2,3}
match 2-3 word chars \\w{2,3}
匹配 2-3 个单词字符$
End of string $
字符串结尾Note that this pattern accepts a limited set of email addresses allowing only to match \\w
请注意,此模式接受一组有限的电子邮件地址,只允许匹配
\\w
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.