简体   繁体   English

正则表达式在多行上太贪心了

[英]regex too greedy over multiple lines

I have the following code: 我有以下代码:

$text = "Lorem ipsum dolor sit amet, [b]consectetur adipiscing elit[/b]. 
Nunc lorem velit, lacinia ut commodo in, suscipit vitae magna. 
Nam imperdiet neque blandit semper tempus. 
Curabitur sapien ante, vestibulum vitae ante a, condimentum dignissim tortor. Aenean adipiscing tincidunt lorem, non eleifend tellus suscipit a. Nulla convallis [b]
pulvinar ligula[/b], at tempor ante. Fusce a tellus enim. Vivamus nibh eros, ultrices at auctor quis, fringilla nec dolor. Aenean nec tincidunt odio, id pulvinar felis. Pellentesque in augue volutpat, gravida nibh eu, lobortis augue.";

preg_match_all("#(\[b\].*\[/b\])#s", $text, $value);

my $value is returning from the first [b] to the last [/b]. 我的$value从第一个[b]返回到最后一个[/ b]。 I need it to match each pair individually. 我需要它来单独匹配每一对。

As I understand it, I have to use the s at the end to select multiple lines, but the * is then being too greedy. 据我了解,我必须使用最后的s来选择多行,但是*太贪婪了。 I can't use just a ? 我不能只用一个? as I the number of characters can vary... what am I missing? 因为我的人物数量可以变化......我错过了什么?

This is a common mistake. 这是一个常见的错误。 Unless you do something to avoid it, the regex engine will find the longest substring that can possibly be matched by your pattern. 除非你做一些事情来避免它,否则正则表达式引擎会找到你的模式可能匹配的最长子字符串。 Depending on the context, there might be various possible solutions, but for engines that support Perl regex syntaxes, the easiest is generally to use the "non-greedy" variant of the repetition operator you are using. 根据上下文,可能有各种可能的解决方案,但对于支持Perl正则表达式语法的引擎,最简单的方法通常是使用您正在使用的重复运算符的“非贪婪”变体。 That is, *? 那就是*? instead of * , +? 而不是*+? instead of + , ?? 而不是+?? instead of ? 而不是? or {m,n}? 还是{m,n}? instead of {m,n} . 而不是{m,n}

So in your example, the pattern should read as: 因此,在您的示例中,模式应如下所示:

preg_match_all("#(\[b\].*?\[/b\])#s", $text, $value);

Another way to avoid the lazy quantifier: 另一种避免懒惰量词的方法:

preg_match_all('~\[b](?>[^[]++|\[(?!/b]))*+\[/b]~', $text, $value);

With this way, you avoid two problems: 通过这种方式,您可以避免两个问题:

  1. greedy quantifier is not a problem, since the character class stop at each opening square bracket 贪心量词不是问题,因为字符类停在每个开口方括号
  2. since you don't use the dot, you don't care about the 's' modifier and newlines. 因为你不使用点,你不关心's'修饰符和换行符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM