简体   繁体   中英

Python, Regex, extract grouped emails within curly brackets

I'm trying to extract multiple Emails from string. I'm using this regex:

re.findall(r'[\w\.-]+@[\w\.-]+(?:\.[\w]+)+', text)

It works fine, but sometimes in text Email names with the same domain are grouped in curly brackets:

{annie,bonnie}@gmail.com

So my question is how properly to parse it and extract as separate emails:
annie@gmail.com, bonnie@gmail.com ?

I've tried to modify regex to take into account brackets and comma, following with simple function, but in that case I get a lot of garbage from string.

Any help appreciated.

You can use

(?:{([^{}]*)}|\b\w[\w.-]*)(@[\w.-]+\.\w+)

See the regex demo . Details :

  • (?:{([^{}]*)}|\b\w[\w.-]*) - a non-capturing group matching:
  • {([^{}]*)} - a { , then Group 1 capturing any zero or more chars other than { and } and then a }
  • | - or
  • \b\w[\w.-]* - a word boundary (it will make matching more efficient), a word char, and then zero or more word, dot or hyphen chars
  • (@[\w.-]+\.\w+) - Group 2: a @ , one or more word, dot or hyphen chars, then a . and one or more word chars.

See a Python demo :

import re
text = "Emails like {annie,bonnie}@gmail.com, annie2@gmail.com, then a bonnie2@gmail.com."
emails = []
rx_email = re.compile( r'(?:{([^{}]*)}|\b\w[\w.-]*)(@[\w.-]+\.\w+)' )
for m in rx_email.finditer(text):
    if m.group(1):
        for email in m.group(1).split(','):
            emails.append(f'{email}{m.group(2)}')
    else:
        emails.append(m.group())
print(emails)
# => ['annie@gmail.com', 'bonnie@gmail.com', 'annie2@gmail.com', 'bonnie2@gmail.com']

The logic is

  • Get the emails with {...} in front of @ while capturing the contents inside the braces into Group 1 and the @... into Group 2
  • Check if Group 1 was matched, and if yes, split the contents with a comma and build the resulting matches by concatenating the comma-separating user names with the domain part
  • If Group 1 did not match, just append the match value to the resulting list.

You may use re.findall along with a list comprehension:

inp = "{annie,bonnie}@gmail.com"
parts = re.findall(r'\{(.*?)\}(@\S+)\b', inp)[0]
emails = [email + parts[1] for email in parts[0].split(',')]
print(emails)

This prints:

['annie@gmail.com', 'bonnie@gmail.com']
x = 'xy2@gmail.com data@gmail.com google@gmail.com {annie,bonnie}@gmail.com'
q = []
for i,j in enumerate(x):
    if '{' == j or '}' ==j :
        q.append(i)  
    
y1 = x[q[0]+1:q[1]]
a1 = y1.replace(','," ")
a1 = a1.split(" ")
z = [i+'@gmail.com' for i in a1]

x = x.replace("{",'')
y = x.replace("}",'')
z1 = " ".join(z)
z2 = y.replace(y1,z1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM