简体   繁体   English

Python正则表达式:向后引用一个匹配的正则表达式组

[英]Python Regex: Backreference a matching regex group

I am trying to return 2 subgroups from my regex match: 我想从我的正则表达式匹配项中返回2个子组:

email_add = "John@Doe.com <John@Doe.com>"
m = re.match(r"(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b) <(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)", email_add)

But it doesn't seem to match: 但这似乎不匹配:

>>> m.group()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

I suspect I probably did not group it correctly or I'm using incorrect word boundary. 我怀疑我可能没有正确将其分组,或者我使用了错误的单词边界。 I tried \\w instead of \\b but the result is the same. 我尝试用\\ w代替\\ b,但是结果是一样的。

Could someone please point out my errors. 有人可以指出我的错误。

You are matching uppercase AZ letters only , so the character sequences ohn and oe and com cause the pattern not to match anything. 你是匹配大写AZ 字母 ,所以字符序列ohnoecom导致模式不匹配任何东西。

Adding the re.I case-insensitive flag makes your pattern work: 添加不区分大小写的re.I标志使您的模式有效:

>>> import re
>>> email_add = "John@Doe.com <John@Doe.com>"
>>> re.match(r"(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b) <(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)", email_add)
>>> re.match(r"(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b) <(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)", email_add, re.I)
<_sre.SRE_Match object at 0x1030d4f10>
>>> _.groups()
('John@Doe.com', 'John@Doe.com')

or you could add az to the character classes instead: 或者您可以在字符类中添加az

>>> re.match(r"(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b) <(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b)", email_add)
<_sre.SRE_Match object at 0x1030d4f10>
>>> _.groups()
('John@Doe.com', 'John@Doe.com')

What's wrong with your regex has been pointed out, but you may also want to consider email.utils.parseaddr : 指出了您的正则表达式有什么问题,但您可能还需要考虑email.utils.parseaddr

>>> from email.utils import parseaddr
>>> email_add = "John@Doe.com <John@Doe.com>"
>>> parseaddr(email_add)
('', 'John@Doe.com')  # doesn't get first part, so could assume it's same as 2nd?
>>> email_add = "John Doe <John@Doe.com>"
>>> parseaddr(email_add)
('John Doe', 'John@Doe.com') # does get name and email

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM