[英]re.findall failing for regex with grouping in Python
Im writing a python program using regex to find email addresses. 我正在使用正则表达式编写python程序来查找电子邮件地址。 re.findall function is giving wrong output whenever I try to use round brackets for grouping. 每当我尝试使用圆括号进行分组时,re.findall函数会给出错误的输出。 Can anyone point out the mistake / suggest an alternate solution? 任何人都可以指出错误/提出替代解决方案吗?
Here are two snippets of code to explain - 以下是两段代码解释 -
pat = "[\w]+[ ]*@[ ]*[\w]+.[\w]+"
re.findall(pat, 'abc@cs.stansoft.edu.com .rtrt.. myacc@gmail.com ')
gives the output 给出输出
['abc@cs.stansoft', 'myacc@gmail.com']
However, if I use grouping in this regex and modify the code as 但是,如果我在此正则表达式中使用分组并将代码修改为
pat = "[\w]+[ ]*@[ ]*[\w]+(.[\w]+)*"
re.findall(pat, 'abc@cs.stansoft.edu.com .rtrt.. myacc@gmail.com ')
the output is 输出是
['.com', '.com']
To confirm the correctness of the regex, I tried this specific regex (in second example) in http://regexpal.com/ with the same input string, and both the email addresses are matched successfully. 为了确认正则表达式的正确性,我在http://regexpal.com/中使用相同的输入字符串尝试了这个特定的正则表达式(在第二个示例中),并且两个电子邮件地址都成功匹配。
In Python, re.findall
returns the whole match only if there are no groups, if there are groups then it will return the groups. 在Python中,只有在没有组的情况下, re.findall
返回整个匹配项,如果有组,则返回组。 To get around this, you should use a non-capturing group (?:...)
. 要解决这个问题,您应该使用非捕获组(?:...)
。 In this case: 在这种情况下:
pat = "[\w.]+ *@ *\w+(?:\.\w+)*"
re.findall(pat, 'abc@cs.stansoft.edu.com .rtrt.. myacc@gmail.com ')
You would use groups if you wanted to do something like separate the user from the host: 如果您想要将用户与主机分开,则可以使用组:
(The hyphens are optional, some emails have them.) (连字符是可选的,有些电子邮件有连字符。)
pat = '([\w\.-]+)@([\w\.-]+)'
re.findall(pat, 'abc@cs.stansoft.edu.com .rtrt.. myacc@gmail.com ')
Output: 输出:
[('abc', 'cs.stansoft.edu.com'), ('myacc', 'gmail.com')]
To further illustrate we could replace the host, and keep the user from group 1 (\\1): 为了进一步说明我们可以替换主机,并使用户远离组1(\\ 1):
emails = 'abc@cs.stansoft.edu.com .rtrt.. myacc@gmail.com '
pat = '([\w\.-]+)@([\w\.-]+)'
re.sub(pat, r'\1@live.com', emails)
Output: 输出:
'abc@live.com .rtrt.. myacc@live.com '
Simply remove the parentheses from the pattern to match the whole email: 只需从模式中删除括号即可匹配整个电子邮件:
pat = '[\w\.-]+@[\w\.-]+'
re.findall(pat, 'abc@cs.stansoft.edu.com .rtrt.. myacc@gmail.com ')
Output: 输出:
['abc@cs.stansoft.edu.com', 'myacc@gmail.com']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.