[英]Regular Expression Pattern Matching order
In the Regular Expression engines in all languages I'm familiar with, the .*
notation indicates matching zero or more characters. 在我熟悉的所有语言的正则表达式引擎中,
.*
表示法表示匹配零个或多个字符。 Consider the following Javascript code: 请考虑以下Javascript代码:
var s = "baaabcccb";
var pattern = new RegExp("b.*b");
var match = pattern.exec(s);
if (match) alert(match);
This outputs baaabcccb
这输出
baaabcccb
The same thing happens with Python: Python也会发生同样的事情:
>>> import re
>>> s = "baaabcccb"
>>> m = re.search("b.*b", s)
>>> m.group(0)
'baaabcccb'
What is the reason that both of these languages match "baaabcccb"
instead of simply "baaab"
? 这两种语言都匹配
"baaabcccb"
而不仅仅是"baaab"
什么? The way I read the pattern b.*b
is "find a sub-string that starts with b
, then has any number of other characters, then ends with b
." 我读取模式
b.*b
是“找到以b
开头的子字符串,然后有任意数量的其他字符,然后以b
结尾。” Both baaab
and baaabcccb
satisfy this requirement, yet both Javascript and Python match the latter. baaab
和baaabcccb
满足这个要求,但Javascript和Python都匹配后者。 I would have expected it to match baaab
, simply because that sub-string satisfies the requirement and appears first. 我原以为它会匹配
baaab
,因为该子字符串满足要求并首先出现。
So why does the pattern match baaabcccb
in this case? 那么为什么在这种情况下模式匹配
baaabcccb
? And, is there any way to modify this behavior (in either language) so that it matches baaab
instead? 并且,有没有办法修改这种行为(在任何一种语言中),以便它匹配
baaab
?
You can make the regex not greedy by adding a ?
你可以通过添加一个
?
来使正则表达式不贪心?
after the *
like this: b.*?b
. 之后
*
喜欢这样: b.*?b
。 Then it will match the smallest string posible. 然后它将匹配最小的字符串posible。 By default the regex is greedy and will try to find the longest possible match.
默认情况下,正则表达式是贪婪的,并将尝试找到最长的匹配。
.*
is a greedy match. .*
是一场贪婪的比赛。 .*?
is the non-greedy version 是非贪婪的版本
Because * and also + are essentially greedy (at least in python, i am not sure about js). 因为*和+本质上是贪婪的(至少在python中,我不确定js)。 They will try to match as far as possible.
他们会尽量匹配。 if you want to avoid this issue you could add ?
如果你想避免这个问题你可以添加? after them.
在他们之后。
Here is a great tutorial about this, in the greedy vs non-greedy section: google python class 这是一个关于这个的好教程,在贪婪与非贪婪的部分: google python类
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.