简体   繁体   English

关于php正则表达式的递归模式

[英]About php regexp for recursive pattern

I've this code: 我有以下代码:

$string="some text {@block}outside{@block}inside{@}outside{@} other text";

function catchPattern($string,$layer){
  preg_match_all(
    "/\{@block\}".
      "(".
        "(".
           "[^()]*|(?R)".
        ")*".
      ")".
    "\{@\}/",$string,$nodes);
  if(count($nodes)>1){
    for($i=0;$i<count($nodes[1]); $i++){
      if(is_string($nodes[1][$i])){
        if(strlen($nodes[1][$i])>0){
          echo "<pre>Layer ".$layer.":   ".$nodes[1][$i]."</pre><br />";
          catchPattern($nodes[1][$i],$layer+1);
        }
      }
    }
  }
}

catchPattern($string,0);

That gives me this output: 这给了我这个输出:

Layer 0:   outside{@block}inside{@}outside

Layer 1:   inside

And all it's ok! 没关系! But If I change a bit string and regexp: 但是,如果我更改了一个字符串和正则表达式:

$string="some text {@block}outside{@block}inside{@end}outside{@end} other text";

function catchPattern($string,$layer){
  preg_match_all(
    "/\{@block\}".
      "(".
        "(".
           "[^()]*|(?R)".
        ")*".
      ")".
    "\{@end\}/",$string,$nodes);
  if(count($nodes)>1){
    for($i=0;$i<count($nodes[1]); $i++){
      if(is_string($nodes[1][$i])){
        if(strlen($nodes[1][$i])>0){
          echo "<pre>Layer ".$layer.":   ".$nodes[1][$i]."</pre><br />";
          catchPattern($nodes[1][$i],$layer+1);
        }
      }
    }
  }
}

catchPattern($string,0);

I didnt get any output. 我没有得到任何输出。 Why? 为什么? I expected the same output. 我期望相同的输出。

The problem is that the backtracking limit is exhausted. 问题是回溯限制已用尽。 You can always modify the backtracking limit . 您可以随时修改回溯限制 However, for the cases I have come across, rewriting the regex is the better solution . 但是,对于我遇到的情况,重写正则表达式是更好的解决方案

You can't just anyhow modify an existing regex and expect to make it work, especially for recursive regex. 您不能以任何方式修改现有的正则表达式并期望使其正常工作,特别是对于递归正则表达式。 It seems that you take the existing bracket matching regex and modify it. 似乎您采用了与括号匹配的正则表达式并对其进行了修改。 There are a few problems in your regex: 正则表达式中存在一些问题:

  • [^()]* : There is no reason to exclude () inside the text within the {@block}{@end} portion. [^()]* :没有理由在{@block}{@end}部分的文本中排除() But the more severe problem is that it matches {} . 但更严重的问题是它与{}匹配。 The engine will go all the way to the nearest () or the end of the string, fail to match, then backtrack. 引擎将一直到最接近的()或字符串的末尾,不匹配,然后回溯。 This is why the backtracking limit is reached. 这就是达到回溯限制的原因。

    This can be fixed by changing this portion to [^{}] to disallow {} inside {@block}{@end} . 这可以通过改变该部分以被固定[^{}]为不允许{}{@block}{@end} Nested {@block}{@end} will still be matched, due to the recursion. 由于递归,嵌套的{@block}{@end}仍将匹配。

    Note that this will totally disallow {} to be specified as text within {@block}{@end} . 请注意,这将完全禁止将{}指定为{@block}{@end}文本 It may be possible to modify the regex to allow such case, depending on the escaping scheme. 根据转义方案,可以修改正则表达式以允许这种情况。

    I also change the quantifier of [^{}] from * to + , since there is no reason to match an empty string when the quantifier of the whole group ([^{}]+|(?R)) is * . 我还改变的量词[^{}]*+ ,因为没有理由在整个组的量词来匹配一个空字符串([^{}]+|(?R))*

     /\\{@block\\}((?:[^{}]+|(?R))*)\\{@end\\}/ 
  • After the modification above, the second problem is with invalid input string. 经过上述修改后,第二个问题是输入字符串无效。 The default behavior of quantifier is that backtracking will be performed until a match is found or all possibilities are exhausted. 量词的默认行为是执行回溯,直到找到匹配项或所有可能性用尽。 Therefore, you will reach backtracking limit in those cases. 因此,在这种情况下,您将达到回溯限制。

    Since what [^{}]+ can match and what the recursive regex can match are mutually exclusive 1 , the regex is not ambiguous and can be matched without backtracking. 由于[^{}]+可以匹配的内容与递归正则表达式可以匹配的内容是互斥的1 ,因此该正则表达式不是模棱两可的,可以进行匹配而无需回溯。 We can tell the engine not to backtrack by using possessive quantifier , which is the normal quantifier, with + added behind. 我们可以通过使用所有格量词 (通常的量词,后跟+告诉引擎不要回溯。

The final solution is: 最终的解决方案是:

/\{@block\}((?:[^{}]++|(?R))*+)\{@end\}/

Demo 演示版

Footnotes 脚注

1 : It is quite obvious, since text matching [^{}]+ will never start with { , while the text matching the recursive regex must start with { . 1 :很明显,因为匹配[^{}]+的文本永远不会以{开头,而匹配递归正则表达式的文本必须以{开头。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM