简体   繁体   English

RegExp练习:带有先行断言的不情愿量词

[英]RegExp exercise: reluctant quantifier with a lookahead assertion

Can you explain me how this works? 你能解释一下这是怎么回事吗? Here is an example: 这是一个例子:

<!-- The quick brown fox 
              jumps over the lazy dog -->

<!--[if IE 7]>
    <link rel="stylesheet" type="text/css" href="/supersheet.css" />
<![endif]-->

<!-- Pack my box with five dozen liquor jugs -->

First, I tried to use the following regular expression to match the content inside conditional comments: 首先,我尝试使用以下正则表达式来匹配条件注释中的内容:

/<!--.*?stylesheet.*?-->/s

It failed, as the regular expression matches all the content before the first <!-- and the last --> . 它失败了,因为正则表达式匹配第一个<!--和最后一个-->之前的所有内容。 Then I tried using another pattern with a lookahead assertion: 然后我尝试使用另一种模式与前瞻断言:

/<!--(?=.*?stylesheet).*?-->/s

It works and matches exactly what I need. 它的工作原理与我需要的完全匹配。 However, the following regular expression works as well: 但是,以下正则表达式也起作用:

/<!--(?=.*stylesheet).*?-->/s

The last regular expression does not have a reluctant quantifier in the lookahead assertion. 最后一个正则表达式在前瞻断言中没有一个不情愿的量词。 And now I am confused. 现在我很困惑。 Can anyone explain me how it works? 谁能解释一下它是如何工作的? Maybe there is a better solution for this example? 也许这个例子有更好的解决方案?

Updated: 更新:

I tried usig the regular expressions with lookahead assertion in another document, and it failed to mach the content between the comments. 我尝试在另一个文档中使用lookahead断言来使用正则表达式,并且它无法在注释之间添加内容。 So, this one /<!--(?=.*?stylesheet).*?-->/s (as well as this one /<!--(?=.*stylesheet).*?-->/s ) is not correct. 所以,这个/ / /<!--(?=.*?stylesheet).*?-->/s (?=。*? /<!--(?=.*?stylesheet).*?-->/s (以及这一个/ /<!--(?=.*?stylesheet).*?-->/s (? /<!--(?=.*stylesheet).*?-->/s )不正确。 Do not use it and try other suggestions. 不要使用它并尝试其他建议。

Updated: 更新:

The solution has been found by Jonny 5 (see the answer). Jonny 5找到了解决方案(见答案)。 He suggested three options: 他提出了三种选择:

  1. Using of a negated hyphen to limit match. 使用否定连字符来限制匹配。 This option works only if there is no a hyphen between the tags. 仅当标记之间没有连字符时,此选项才有效。 If a stylesheet has an URL /style-sheet.css , it will not work. 如果样式表具有URL /style-sheet.css ,则它将不起作用。
  2. Using of escape sequence: \\K . 使用转义序列: \\K It works like a charm. 它就像一个魅力。 The downsides are the following: 缺点如下:
    • It is terribly slow (in my case, it was 8-10 times slower than the other solutions) 它非常慢(在我的情况下,它比其他解决方案慢8-10倍)
    • Only available since PHP 5.2.4 仅适用于PHP 5.2.4
  3. Using a lookahead to narrow the match. 使用前瞻来缩小比赛范围。 This is the goal I tried to achieve, but my expirience of using lookaround assertions was insufficient to perform the task. 这是我试图实现的目标,但是我使用外观断言的经验不足以执行任务。

I think the following is a good solution for my example: 我认为以下是我的例子的一个很好的解决方案:

/(?s)<!--(?:(?!<!).)+?stylesheet.+?-->/

The same but with the s modifier at the end: 相同但最后使用s修饰符:

/<!--(?:(?!<!).)+?stylesheet.+?-->/s

As I said, this is a good solution, but I managed to improve the pattern and found another one that in my case works faster. 正如我所说,这是一个很好的解决方案,但我设法改进了模式,并找到了另一个在我的情况下工作得更快的模式。

So, the final solution is the following: 所以,最终的解决方案如下:

/<!--(?:(?!-->).)+?stylesheet.+?-->/s

Thanks all the participants for interesting answers. 感谢所有参与者的有趣答案。

The string stylesheet is mentioned only one time in your test document, so both regular expressions you tried will match the same thing but in different ways. 字符串stylesheet在测试文档中只提到一次,因此您尝试的两个正则表达式将以不同的方式匹配相同的内容。

<!--(?=.*?stylesheet).*?-->/s

This one does the following: 这个做了以下几点:

  • Capture <!-- . 捕获<!--
  • Look ahead, capturing characters up to and including stylesheet . 展望未来,捕捉角色,包括stylesheet Fail if not found. 如果找不到则失败。
  • Capture characters up to and including --> . 捕获角色,包括-->
<!--(?=.*stylesheet).*?-->/s

This one does the following: 这个做了以下几点:

  • Capture <!-- . 捕获<!--
  • Look ahead, capturing any character until no longer possible. 向前看,捕捉任何角色直到不再可能。 Backtrack, continuously trying to match stylesheet . Backtrack,不断尝试匹配stylesheet Fail if not found. 如果找不到则失败。
  • Capture characters up to and including --> . 捕获角色,包括-->

Basically, one needs to backtrack significantly while the other doesn't. 基本上,一个人需要显着地回溯,而另一个人则不需要。

If your subject instead is... 如果您的主题是......

<!-- The quick brown fox 
              jumps over the lazy dog -->

<!--[if IE 7]>
    <link rel="stylesheet" type="text/css" href="/supersheet.css" /> <![endif]-->

<!-- Pack my box with five dozen stylesheets -->

you get two different results. 你得到两个不同的结果。 The former would find the first stylesheet , while the latter would find the second (and last) since it starts searching from the end of the string. 前者会找到第一个stylesheet ,而后者会找到第二个(和最后一个),因为它从字符串的末尾开始搜索。

To match only the part <!-- ... stylesheet ... --> there are many ways: 要仅匹配<!-- ... stylesheet ... --> ,有很多方法:

1.) Use a negated hyphen [^-] to limit the match and stay in between <!-- and stylesheet 1.)使用否定连字符[^-]来限制匹配并保持在<!--stylesheet

(?s)<!--[^-]+stylesheet.+?-->

[^-] allows only characters, that are not a hyphen. [^-]仅允许不是连字符的字符。 See test at regex101 . 请参阅regex101上的测试


2.) To get the "last" or closest match without much regex effort, also can put a greedy dot before to ᗧ eat up. 2.)要获得“最后”或最接近的匹配而没有太多正则表达式的努力,也可以在ᗧ吃之前放一个贪婪的 Makes sense if not matching globally / only one item to match. 如果不匹配全局/只匹配一个项目,则有意义。 Use \\K to reset after the greed: 使用\\ K在贪婪后重置

(?s)^.*\K<!--.+?stylesheet.+?-->

See test at regex101 . 请参阅regex101上的测试 Also can use a capture group and grab $1: (?s)^.*(<!--.+?stylesheet.+?-->) 也可以使用捕获组并获取$ 1:( (?s)^.*(<!--.+?stylesheet.+?-->)


3.) Using a lookahead to narrow it down is usually more costly: 3.)使用前瞻来缩小范围通常更昂贵:

(?s)<!--(?:(?!<!).)+?stylesheet.+?-->

See test at regex101 . 请参阅regex101上的测试 (?!<!). looks ahead at each character in between <!-- and stylesheet if not starting another <! 展望<!--stylesheet中的每个角色,如果没有开始另一个<! ... to stay inside one element. ......留在一个元素里面 Similar to the negated hyphen solution. 类似于否定的连字符解决方案。


Instead of .* I used .+ for one or more - depends on what to be matched. 而不是.*我使用.+ 一个或多个 - 取决于匹配什么。 Here + fits better. 这里+更合适。
What solution to use depends on the exact requirements. 使用什么解决方案取决于具体要求。 For this case I would use the first. 对于这种情况,我会使用第一个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM