[英]Python re doesn't match last capture group
For the following code: 对于以下代码:
t1 = 'tyler vs ryan'
p1 = re.compile('(.*?) vs (.*?)')
print p1.findall(t1)
the output is: 输出是:
[('tyler', '')]
but I would've expected this: 但我会期待这个:
[('tyler', 'ryan')]
I have found that if I add a delimiter I can get it to work: 我发现如果我添加一个分隔符,我可以让它工作:
t2 = 'tyler vs ryan!' # Notice the exclamation mark
p2 = re.compile('(.*?) vs (.*?)!') # Notice the exclamation mark
print p2.findall(t2)
outputs: 输出:
[('tyler', 'ryan')]
Is there a way I can get my matches without having a custom delimiter? 有没有办法让我的比赛没有自定义分隔符?
(.*?)
is non greedy it will match the smallest it can which is the empty string (after the vs
at least) (.*?)
非贪婪它会匹配最小的空字符串(至少在vs
之后)
try (.*)
or ([^ ]*)
or something 尝试(.*)
或([^ ]*)
或其他东西
The regex is capturing the shortest string it can; 正则表达式捕获它可以的最短字符串; that's what the question mark signifies. 这就是问号所代表的含义。 So as soon as it has captured the text vs
it captures an empty string, then stops. 因此,只要它捕获了文本vs
它就会捕获一个空字符串,然后停止。 This is what it looks like: 这就是它的样子:
Direct link: https://regex101.com/r/hO4lM7/2 直接链接: https : //regex101.com/r/hO4lM7/2
If you use: 如果您使用:
re.compile('(.*?) vs (.*)')
that is, without the 2nd question mark, it will capture the text after vs
as well. 也就是说,如果没有第二个问号,它也会在vs
之后捕获文本。
No. Try this 不,试试吧
t1 = 'tyler vs ryan'
p1 = re.compile('(.*?) vs (.*?)$')
print p1.findall(t1)
gives: 得到:
[('tyler', 'ryan')]
$ - Matches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline. $ - 匹配字符串的结尾或在字符串末尾的换行符之前,并且在MULTILINE模式下也匹配换行符。
If you are assured of single-name combatants, you could use a regex like: 如果您确信单名战斗员,您可以使用正则表达式:
r'\s*(\S+)\s*vs\s*(\S+)\s*'
Your use of findall() implies to me you're expecting to have to match multiple pairings - if not, then you may want to use search() and use the ^
and $
regex special characters to more tightly bound your search. 你使用findall()对我来说意味着你必须匹配多个配对 - 如果没有,那么你可能想要使用search()并使用^
和$
regex特殊字符来更紧密地绑定你的搜索。
The non greedy ?
不贪心?
is preventing to capture te second word. 阻止捕获第二个单词。 It would be better to do 这样做会更好
r'(.*) vs (.*)'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.