[英]Python, Regex, extract grouped emails within curly brackets
I'm trying to extract multiple Emails from string.我正在尝试从字符串中提取多封电子邮件。 I'm using this regex:
我正在使用这个正则表达式:
re.findall(r'[\w\.-]+@[\w\.-]+(?:\.[\w]+)+', text)
It works fine, but sometimes in text Email names with the same domain are grouped in curly brackets:它工作正常,但有时在文本中 Email 具有相同域的名称分组在大括号中:
{annie,bonnie}@gmail.com
So my question is how properly to parse it and extract as separate emails:所以我的问题是如何正确解析它并提取为单独的电子邮件:
annie@gmail.com, bonnie@gmail.com
? annie@gmail.com, bonnie@gmail.com
?
I've tried to modify regex to take into account brackets and comma, following with simple function, but in that case I get a lot of garbage from string.我尝试修改正则表达式以考虑括号和逗号,然后使用简单的 function,但在这种情况下,我从字符串中得到了很多垃圾。
Any help appreciated.任何帮助表示赞赏。
You can use您可以使用
(?:{([^{}]*)}|\b\w[\w.-]*)(@[\w.-]+\.\w+)
See the regex demo .请参阅正则表达式演示。 Details :
详情:
(?:{([^{}]*)}|\b\w[\w.-]*)
- a non-capturing group matching: (?:{([^{}]*)}|\b\w[\w.-]*)
- 非捕获组匹配:{([^{}]*)}
- a {
, then Group 1 capturing any zero or more chars other than {
and }
and then a }
{([^{}]*)}
- 一个{
,然后第 1 组捕获除{
和}
之外的任何零个或多个字符,然后是一个}
|
- or \b\w[\w.-]*
- a word boundary (it will make matching more efficient), a word char, and then zero or more word, dot or hyphen chars \b\w[\w.-]*
- 一个单词边界(它将使匹配更有效),一个单词字符,然后是零个或多个单词、点或连字符字符(@[\w.-]+\.\w+)
- Group 2: a @
, one or more word, dot or hyphen chars, then a .
(@[\w.-]+\.\w+)
- 第 2 组:一个@
,一个或多个单词、点或连字符,然后是.
and one or more word chars. See a Python demo :请参阅Python 演示:
import re
text = "Emails like {annie,bonnie}@gmail.com, annie2@gmail.com, then a bonnie2@gmail.com."
emails = []
rx_email = re.compile( r'(?:{([^{}]*)}|\b\w[\w.-]*)(@[\w.-]+\.\w+)' )
for m in rx_email.finditer(text):
if m.group(1):
for email in m.group(1).split(','):
emails.append(f'{email}{m.group(2)}')
else:
emails.append(m.group())
print(emails)
# => ['annie@gmail.com', 'bonnie@gmail.com', 'annie2@gmail.com', 'bonnie2@gmail.com']
The logic is逻辑是
{...}
in front of @
while capturing the contents inside the braces into Group 1 and the @...
into Group 2@
前面带有{...}
的电子邮件,同时将大括号内的内容捕获到第 1 组,将@...
捕获到第 2 组You may use re.findall
along with a list comprehension:您可以将
re.findall
与列表理解一起使用:
inp = "{annie,bonnie}@gmail.com"
parts = re.findall(r'\{(.*?)\}(@\S+)\b', inp)[0]
emails = [email + parts[1] for email in parts[0].split(',')]
print(emails)
This prints:这打印:
['annie@gmail.com', 'bonnie@gmail.com']
x = 'xy2@gmail.com data@gmail.com google@gmail.com {annie,bonnie}@gmail.com'
q = []
for i,j in enumerate(x):
if '{' == j or '}' ==j :
q.append(i)
y1 = x[q[0]+1:q[1]]
a1 = y1.replace(','," ")
a1 = a1.split(" ")
z = [i+'@gmail.com' for i in a1]
x = x.replace("{",'')
y = x.replace("}",'')
z1 = " ".join(z)
z2 = y.replace(y1,z1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.