简体   繁体   English

正则表达式模式匹配顺序

[英]Regular Expression Pattern Matching order

In the Regular Expression engines in all languages I'm familiar with, the .* notation indicates matching zero or more characters. 在我熟悉的所有语言的正则表达式引擎中, .*表示法表示匹配零个或多个字符。 Consider the following Javascript code: 请考虑以下Javascript代码:

var s = "baaabcccb";
var pattern = new RegExp("b.*b");
var match = pattern.exec(s);
if (match) alert(match);

This outputs baaabcccb 这输出baaabcccb

The same thing happens with Python: Python也会发生同样的事情:

>>> import re
>>> s = "baaabcccb"
>>> m = re.search("b.*b", s)
>>> m.group(0)
'baaabcccb'

What is the reason that both of these languages match "baaabcccb" instead of simply "baaab" ? 这两种语言都匹配"baaabcccb"而不仅仅是"baaab"什么? The way I read the pattern b.*b is "find a sub-string that starts with b , then has any number of other characters, then ends with b ." 我读取模式b.*b是“找到以b开头的子字符串,然后有任意数量的其他字符,然后以b结尾。” Both baaab and baaabcccb satisfy this requirement, yet both Javascript and Python match the latter. baaabbaaabcccb满足这个要求,但Javascript和Python都匹配后者。 I would have expected it to match baaab , simply because that sub-string satisfies the requirement and appears first. 我原以为它会匹配baaab ,因为该子字符串满足要求首先出现。

So why does the pattern match baaabcccb in this case? 那么为什么在这种情况下模式匹配baaabcccb And, is there any way to modify this behavior (in either language) so that it matches baaab instead? 并且,有没有办法修改这种行为(在任何一种语言中),以便它匹配baaab

You can make the regex not greedy by adding a ? 你可以通过添加一个?来使正则表达式不贪心? after the * like this: b.*?b . 之后*喜欢这样: b.*?b Then it will match the smallest string posible. 然后它将匹配最小的字符串posible。 By default the regex is greedy and will try to find the longest possible match. 默认情况下,正则表达式是贪婪的,并将尝试找到最长的匹配。

.* is a greedy match. .*是一场贪婪的比赛。 .*? is the non-greedy version 是非贪婪的版本

Because * and also + are essentially greedy (at least in python, i am not sure about js). 因为*和+本质上是贪婪的(至少在python中,我不确定js)。 They will try to match as far as possible. 他们会尽量匹配。 if you want to avoid this issue you could add ? 如果你想避免这个问题你可以添加? after them. 在他们之后。

Here is a great tutorial about this, in the greedy vs non-greedy section: google python class 这是一个关于这个的好教程,在贪婪与非贪婪的部分: google python类

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM