简体   繁体   English

所有格量词到底如何工作?

[英]How exactly does the possessive quantifier work?

At the end of the page there is at attempted explanation of how do greedy, reluctant and possessive quantifiers work: http://docs.oracle.com/javase/tutorial/essential/regex/quant.html 在页面末尾,尝试解释贪婪,勉强和所有格修饰符的工作方式: http : //docs.oracle.com/javase/tutorial/essential/regex/quant.html

However I tried myself an example and I don't seem to understand it fully. 但是我尝试了一个例子,但我似乎并没有完全理解它。

I will paste my results directly: 我将直接粘贴结果:

Enter your regex: .*+foo
Enter input string to search: xfooxxxxxxfoo
No match found.

Enter your regex: (.*)+foo
Enter input string to search: xfooxxxxxxfoo
I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.

Why does the first reg.exp. 为什么第一个reg.exp。 find no match and the second does? 找不到匹配项,第二个匹配吗? What is the exact difference between those 2 reg.exp.? 这2个reg.exp。之间的确切区别是什么?

The + after another quantifier means "don't allow the regex engine to backtrack into whatever the previous token has matched". 另一个量词后面的+表示“不允许正则表达式引擎回溯到先前标记已匹配的任何内容”。 (See a tutorial on possessive quantifiers here). (请参阅此处的所有格量词教程)。

So when you apply .*foo to "xfooxxxxxxfoo" , the .* first matches the entire string. 因此,当您将.*foo应用于"xfooxxxxxxfoo".*首先会匹配整个字符串。 Then, since foo can't be matched, the regex engine backtracks until that's possible, achieving a match when .* has matched "xfooxxxxxx" and foo has matched "foo" . 然后,由于无法匹配foo ,因此正则表达式引擎将回溯到可能的情况,当.*匹配"xfooxxxxxx"foo匹配"foo"时实现匹配。

Now the additional + prevents that backtracking from happening, so the match fails. 现在,附加的+可以防止发生回溯,因此匹配失败。

When you write (.*)+foo . 当您写(.*)+foo the + takes on an entirely different meaning; +含义完全不同; now it means "one or more of the preceding token". 现在它的意思是“前面的一个或多个令牌”。 You've created nested quantifiers, which is not a good idea, by the way. 顺便说一下,您已经创建了嵌套量词。 If you apply that regex to a string like "xfoxxxxxxxxxfox" , you'll run into catastrophic backtracking . 如果将该正则表达式应用于"xfoxxxxxxxxxfox"类的字符串,则会遇到灾难性的回溯

The possessive quantifier takes the entire string and checks if it matches, if not it fails. 所有格量词会采用整个字符串,并检查它是否匹配,否则匹配失败。 In your case xfooxxxxxxfoo matches the .*+ but then you ask for another foo , which isn't present, so the matcher fails. 在您的情况下,xfooxxxxxxfoo与.*+匹配,但是您要求另一个 foo (不存在),因此匹配器失败。

The greedy quantifier first does the same, but instead of failing it "backs off" and tries again: 贪婪的量词首先执行相同的操作,但没有失败,而是“退后”并再次尝试:

xfooxxxxxxfoo fail
xfooxxxxxxfo fail
xfooxxxxxxf fail
xfooxxxxxx match

In your second regex you ask for something else by confusing the grouping mechanism. 在第二个正则表达式中,您需要通过混淆分组机制来提出其他要求。 You ask for "one or more matches of (.*)" as the + now relates to the () and there is one match. 您要求“一个或多个(。*)匹配项”,因为+现在与()有关,并且存在一个匹配项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM