I'm trying to grab both usernames (such as abc123@) and emails (such as (abc123@company.com) in the same Pythonic regex.
Here's an example statement:
abc123@ is a researcher at abc123@company.com doing cool work.
Regex used:
For username:
re.match("^([A-Za-z])+([@]){1}$")
For email:
re.match("^([A-Za-z0-9-_])+(@company.com){1}$")
Most cases, what happens is username gets grabbed but not email address (trying to grab them as two separate entities) - any ideas what's going on?
Actually you have a lot of groups and repetition counts and start/end boundaries in your regexes that are not really necessary. These 2 are just enough to find each in the input string.
For user: [A-Za-z0-9]+@
For email: [A-Za-z0-9-_]+@company.com
If, however, you want your groupings, these versions that will work:
For user: ([A-Za-z0-9])+(@)
For email: ([A-Za-z0-9-_]+(@company.com)
Disclaimer: I have tested this only on Java, as I am not so familiar with Python.
In your patterns you use anchors ^
and $
to assert the start and end of the string.
Removing the anchors, will leave this for the username pattern ([A-Za-z])+([@]){1}
Here, you can omit the {1}
and the capture groups. Note that in the example, abc123@
has digits that you are not matching.
Still, using [A-Za-z0-9]+@
will get a partial match in the email abc123@company.com
To prevent that, you can use a right hand whitespace boundary.
The username pattern might look like
\b[A-Za-z0-9]+@(?!\S)
\b
A word boundary [A-Za-z0-9]+
Match 1+ occurrences of the listed (including the digits) @
Match literally (?!\S)
Negative lookahead, assert not a non whitspace char to the right For the email address, using a character class like [A-Za-z0-9-_]
is quite strict.
If you want a broad match, you might use:
[^\s@]+@[^\s@]+\.[a-z]{2,}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.