[英]Python regex not capturing groups properly
I have the following regex (?:RE:\w+|Reference:)\s*((Mr|Mrs|Ms|Miss)?\s+([\w-]+)\s(\w+))
.我有以下正则表达式
(?:RE:\w+|Reference:)\s*((Mr|Mrs|Ms|Miss)?\s+([\w-]+)\s(\w+))
。
Input text examples:输入文本示例:
Sample Code:示例代码:
import re
pattern = re.compile('(?:RE:\w+|Reference:)\s*((Mr|Mrs|Ms|Miss)?\s+([\w-]+)\s(\w+))')
result = pattern.findall('RE:11693 Miss Jane Doe 12345678')
For all 4 I expect the output ('Miss Jane Doe', 'Miss', 'Jane', 'Doe')
.对于所有 4 个,我期望输出
('Miss Jane Doe', 'Miss', 'Jane', 'Doe')
。 However in 4th text example I get [('Miss Jane Doe', 'Miss', 'Jane', 'Doe'), (' Test Company', '', 'Test', 'Company')]
但是在第 4 个文本示例中,我得到
[('Miss Jane Doe', 'Miss', 'Jane', 'Doe'), (' Test Company', '', 'Test', 'Company')]
How can I get the correct output我怎样才能得到正确的输出
Just add ^
to the start of the regex to only match at the start.只需将
^
添加到正则表达式的开头即可仅在开头匹配。 This makes it ^(?:RE:\w+|Reference:)\s*((Mr|Mrs|Ms|Miss)?\s+([\w-]+)\s(\w+))
.这使得它成为
^(?:RE:\w+|Reference:)\s*((Mr|Mrs|Ms|Miss)?\s+([\w-]+)\s(\w+))
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.