I've already read this and this and this and lots of others. They don't answer to my problem.
I'd like to filter a string that may contain emails or strings starting by "@" (like emails but without the text before the "@"). I've tested many ones but one of the simplest that begins to get close is:
import re
re.split(r'(@)', "test @aa test2 @bb @cc t-es @dd-@ee, test@again")
Out[40]:
['test ', '@', 'aa test2 ', '@', 'bb ', '@', 'cc t-es ', '@', 'dd-', '@', 'ee, test', '@', 'again']
I'm looking for the right regexp that could give me:
['test ', '@aa', 'test2 ', '@bb ', '@cc', 't-es ', '@dd-', '@ee', 'test@again']
Why try to split when you can go "yo regex, give me all that matches":
test = "test @aa test2 @bb @cc t-es @dd-@ee, test@again"
import re
print(
re.findall("[^\s@]*?@?[^@]* |[^@]*@[^\s@]*", test)
)
# ['test ', '@aa test2 ', '@bb ', '@cc t-es ', '@dd-', '@ee, ', 'test@again']
I tried but I couldn't make the regex any smaller, but at least it works and who expects regex to be small anyway
As per the OP's new requirements(or corrected requirements)
[^\s@]*?@?[^\s@]* |[^@]*@[^\s@]*
My own solution based on different email parsing + simple " @[:alphanum:]+
" parsing is:
USERNAME_OR_EMAIL_REGEX = re.compile(
r"@[a-zA-Z0-9-]+" # simple username
r"|"
r"[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+" # email
r"@" # following: domain name:
r"[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?"
r"(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.