简体   繁体   中英

Matching alphanumeric words, mentions or emails with Python regex

I've already read this and this and this and lots of others. They don't answer to my problem.

I'd like to filter a string that may contain emails or strings starting by "@" (like emails but without the text before the "@"). I've tested many ones but one of the simplest that begins to get close is:

import re
re.split(r'(@)', "test @aa test2 @bb @cc t-es @dd-@ee, test@again")
Out[40]: 
['test ', '@', 'aa test2 ', '@', 'bb ', '@', 'cc t-es ', '@', 'dd-', '@', 'ee, test', '@', 'again']

I'm looking for the right regexp that could give me:

['test ', '@aa', 'test2 ', '@bb ', '@cc', 't-es ', '@dd-', '@ee', 'test@again']

Why try to split when you can go "yo regex, give me all that matches":

test = "test @aa test2 @bb @cc t-es @dd-@ee, test@again"


import re

print(
    re.findall("[^\s@]*?@?[^@]* |[^@]*@[^\s@]*", test)
)
# ['test ', '@aa test2 ', '@bb ', '@cc t-es ', '@dd-', '@ee, ', 'test@again']

I tried but I couldn't make the regex any smaller, but at least it works and who expects regex to be small anyway


As per the OP's new requirements(or corrected requirements)

[^\s@]*?@?[^\s@]* |[^@]*@[^\s@]* 

My own solution based on different email parsing + simple " @[:alphanum:]+ " parsing is:

USERNAME_OR_EMAIL_REGEX = re.compile(
    r"@[a-zA-Z0-9-]+"  # simple username
    r"|"
    r"[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+"  # email 
    r"@"  # following: domain name:
    r"[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?"
    r"(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM