I have a secuence of emails of the form firstname.lastname@gmail.com
.
I would like to get firstname, lastname and domain using regex.
I could manage to get the domain, like this:
domain = re.search('@.+', email).group()
but I'm getting problems with firstname and lastname.
Kindly, can you please explain me how to do it.
You need to use parentheses in regular expressions, in order to access the matched substrings. Notice that there are three parentheses in the regular expression below, for matching the first name, last name and domain, respectively.
m = re.match(r'(.*)\.(.*)@(.*)', email)
assert m is not None
firstname = m.group(1)
lastname = m.group(2)
domain = m.group(3)
Two more notes:
r
to the regular expression string, to avoid duplicating the backslash character.v = "firstname.lastname@gmail.com"
pattern = re.compile(r"(.*)\.(.*)@([a-z]+)\.[a-z]+")
pattern.findall(v)
pattern.findall(v)
Out[7]: [('firstname', 'lastname', 'gmail')]
The output will be a tuple consisting of first name, lastname and domain.
If you want to use 3 capture groups, you can use a negated character class to match all except the characters that you want to allow to prevent some unnecessary backtracking using the .*
^([^\s@.]+)\.([^\s@.]+)@([^\s@]+)$
In parts, the pattern matches:
^
Start of string ([^\s@.]+)
Capture group 1 match 1+ chars other than a whitspace char .
or @
\.
Match a dot([^\s@.]+)
Capture group 2 match 1+ chars other than a whitspace char .
or @
@
Match an @
char ([^\s@]+)
Capture group 3 match 1+ chars other than a whitspace char or @
$
End of string See a regex demo and a Python demo .
For example:
import re
email = "firstname.lastname@gmail.com";
m = re.match(r'([^\s@.]+)\.([^\s@.]+)@([^\s@]+)$', email)
if m:
print(m.groups())
Output
('firstname', 'lastname', 'gmail.com')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.