正则表达式在多行上太贪心了

Question

I have the following code: 我有以下代码：

$text = "Lorem ipsum dolor sit amet, [b]consectetur adipiscing elit[/b]. 
Nunc lorem velit, lacinia ut commodo in, suscipit vitae magna. 
Nam imperdiet neque blandit semper tempus. 
Curabitur sapien ante, vestibulum vitae ante a, condimentum dignissim tortor. Aenean adipiscing tincidunt lorem, non eleifend tellus suscipit a. Nulla convallis [b]
pulvinar ligula[/b], at tempor ante. Fusce a tellus enim. Vivamus nibh eros, ultrices at auctor quis, fringilla nec dolor. Aenean nec tincidunt odio, id pulvinar felis. Pellentesque in augue volutpat, gravida nibh eu, lobortis augue.";

preg_match_all("#(\[b\].*\[/b\])#s", $text, $value);

my $value is returning from the first [b] to the last [/b]. 我的$value从第一个[b]返回到最后一个[/ b]。 I need it to match each pair individually. 我需要它来单独匹配每一对。

As I understand it, I have to use the s at the end to select multiple lines, but the * is then being too greedy. 据我了解，我必须使用最后的s来选择多行，但是*太贪婪了。 I can't use just a ? 我不能只用一个? as I the number of characters can vary... what am I missing? 因为我的人物数量可以变化......我错过了什么？

Answer 1

This is a common mistake. 这是一个常见的错误。 Unless you do something to avoid it, the regex engine will find the longest substring that can possibly be matched by your pattern. 除非你做一些事情来避免它，否则正则表达式引擎会找到你的模式可能匹配的最长子字符串。 Depending on the context, there might be various possible solutions, but for engines that support Perl regex syntaxes, the easiest is generally to use the "non-greedy" variant of the repetition operator you are using. 根据上下文，可能有各种可能的解决方案，但对于支持Perl正则表达式语法的引擎，最简单的方法通常是使用您正在使用的重复运算符的“非贪婪”变体。 That is, *? 那就是*? instead of * , +? 而不是* ， +? instead of + , ?? 而不是+ ， ?? instead of ? 而不是? or {m,n}? 还是{m,n}? instead of {m,n} . 而不是{m,n} 。

So in your example, the pattern should read as: 因此，在您的示例中，模式应如下所示：

preg_match_all("#(\[b\].*?\[/b\])#s", $text, $value);

Answer 2

Another way to avoid the lazy quantifier: 另一种避免懒惰量词的方法：

preg_match_all('~\[b](?>[^[]++|\[(?!/b]))*+\[/b]~', $text, $value);

With this way, you avoid two problems: 通过这种方式，您可以避免两个问题：

greedy quantifier is not a problem, since the character class stop at each opening square bracket 贪心量词不是问题，因为字符类停在每个开口方括号
since you don't use the dot, you don't care about the 's' modifier and newlines. 因为你不使用点，你不关心's'修饰符和换行符。

正则表达式在多行上太贪心了

问题描述

2 个解决方案

解决方案1
2 已采纳 2013-10-23 20:11:38

解决方案2
1 2013-10-23 20:23:19

正则表达式在多行上太贪心了

问题描述

2 个解决方案

解决方案1 2 已采纳 2013-10-23 20:11:38

解决方案2 1 2013-10-23 20:23:19

解决方案1
2 已采纳 2013-10-23 20:11:38

解决方案2
1 2013-10-23 20:23:19