简体   繁体   English

使用 Python 正则表达式匹配字母数字单词、提及或电子邮件

[英]Matching alphanumeric words, mentions or emails with Python regex

I've already read this and this and this and lots of others.我已经读过这个这个这个以及很多其他的。 They don't answer to my problem.他们不回答我的问题。

I'd like to filter a string that may contain emails or strings starting by "@" (like emails but without the text before the "@").我想过滤一个字符串,它可能包含以“@”开头的电子邮件字符串(如电子邮件,但在“@”之前没有文本)。 I've tested many ones but one of the simplest that begins to get close is:我已经测试了很多,但开始接近的最简单的方法之一是:

import re
re.split(r'(@)', "test @aa test2 @bb @cc t-es @dd-@ee, test@again")
Out[40]: 
['test ', '@', 'aa test2 ', '@', 'bb ', '@', 'cc t-es ', '@', 'dd-', '@', 'ee, test', '@', 'again']

I'm looking for the right regexp that could give me:我正在寻找可以给我的正确正则表达式:

['test ', '@aa', 'test2 ', '@bb ', '@cc', 't-es ', '@dd-', '@ee', 'test@again']

Why try to split when you can go "yo regex, give me all that matches":当您可以“哟正则表达式,给我所有匹配项”时,为什么要尝试拆分:

test = "test @aa test2 @bb @cc t-es @dd-@ee, test@again"


import re

print(
    re.findall("[^\s@]*?@?[^@]* |[^@]*@[^\s@]*", test)
)
# ['test ', '@aa test2 ', '@bb ', '@cc t-es ', '@dd-', '@ee, ', 'test@again']

I tried but I couldn't make the regex any smaller, but at least it works and who expects regex to be small anyway我试过了,但我不能让正则表达式更小,但至少它有效,而且谁希望正则表达式很小


As per the OP's new requirements(or corrected requirements)根据 OP 的新要求(或更正的要求)

[^\s@]*?@?[^\s@]* |[^@]*@[^\s@]* 

My own solution based on different email parsing + simple " @[:alphanum:]+ " parsing is:我自己的基于不同电子邮件解析+简单“ @[:alphanum:]+ ”解析的解决方案是:

USERNAME_OR_EMAIL_REGEX = re.compile(
    r"@[a-zA-Z0-9-]+"  # simple username
    r"|"
    r"[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+"  # email 
    r"@"  # following: domain name:
    r"[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?"
    r"(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM