PHP PCRE允许在字符串中嵌套模式（递归）

Question

I got a string like 1(8()3(6()7())9()3())2(4())3()1(0()3()) which is representing a tree. 我得到了像1(8()3(6()7())9()3())2(4())3()1(0()3())这样的字符串，它表示一棵树。 A bracket appears, if we go one level deeper. 如果我们更深一层，将出现一个括号。 Numbers on the same level are neighbours. 在同一级别上的数字是邻居。

Now want to add nodes, for example I want to add a 5 to every path where we have a 1 on the first and a 3 on the second level, so I want to put a 5() after every 3( which is inside of a 1( . So 5() has to be added 3 times and the result should be 1(8()3(5()6()7())9()3(5()))2(4())3()1(0()3(5())) 现在要添加节点，比如我想将增加5到，我们有一个每路1对第一和3在第二个层次，所以我想提出一个5()每经过3(这是内部a 1( 。因此必须将5()相加3次，结果应为1(8()3(5()6()7())9()3(5()))2(4())3()1(0()3(5()))

The Problem is, that I don't get the code working with the PCRE recursion. 问题是，我没有使用PCRE递归的代码。 If I match a tree representation string without and fixed paths like 1( and 3( it works, but when I generate a regex with those fixed patterns, it doesn't work. 如果我匹配不带固定路径（如1(和3(的树表示字符串，它会起作用，但是当我生成具有这些固定模式的正则表达式时，它将不起作用。

This is my code: 这是我的代码：

<?php
header('content-type: text/plain;utf-8');

$node = [1, 3, 5];
$path = '1(8()3(6()7())9()3())2(4())3()1(0()3())';

echo $path.'
';

$nes = '\((((?>[^()]+)|(?R))*)\)';
$nes = '('.$nes.')*';

echo preg_match('/'.$nes.'/x', $path) ? 'matches' : 'matches not';
echo '
';

// creates a regex with the fixed path structure, but allows nested elements in between
// in this example something like: /^anyNestedElementsHere 1( anyNestedElementsHere 3( anyNestedElementsHere ))/
$re = $nes;
for ($i = 0; $i < count($node)-1; $i++) {
    $re .= $node[$i].'\(';
    if ($i != count($node)-2)
        $re .= $nes;
}
$re = '/^('.$re.')/x';

echo str_replace($nes, '   '.$nes.'   ', $re).'
';
echo preg_match($re, $path) ? 'matches' : 'matches not';
echo '
';
// append 5()
echo preg_replace($re, '${1}'.$node[count($node)-1].'()', $path);
?>

And this is the output, where you can see how the generated regex looks like: 这是输出，您可以在其中查看生成的正则表达式的样子：

1(8()3(6()7())9()3())2(4())3()1(0()3())
matches
/^(   (\((((?>[^()]+)|(?R))*)\))*   1\(   (\((((?>[^()]+)|(?R))*)\))*   3\()/x
matches not
1(8()3(6()7())9()3())2(4())3()1(0()3())

I hope you understand my problem and hope you can tell me, where my error is. 希望您理解我的问题，希望您能告诉我我的错误在哪里。

Thanks a lot! 非常感谢！

Answer 1

Solution 解

Regex 正则表达式

The following regex matches nested brackets recursively, finding an opening 1( on the first level, and an opening 3( on the second level (as a direct child). It also attempts successive matches, either on the same level or going down the respective levels to find another match. 下面的正则表达式递归匹配嵌套的括号，在第一层找到一个开口1(在第二层上找到一个开口3( （作为直接子代）。它也尝试连续的匹配，无论是在同一层上还是在相应层上向下水平找到另一个匹配。

~
(?(?=\A)  # IF: First match attempt (if at start of string)   - -

  # we are on 1st level => find next "1("

  (?<balanced_brackets>
    # consumes balanced brackets recursively where there is no match
    [^()]*+
    \(  (?&balanced_brackets)*?  \)
  )*?

  # match "1(" => enter level 2
  1\(

|         # ELSE: Successive matches  - - - - - - - - - - - - - -

  \G    # Start at end of last match (level 3)

  # Go down to level 2 - match ")"
  (?&balanced_brackets)*?
  \)

  # or go back to level 1 - matching another ")"
  (?>
    (?&balanced_brackets)*?
    \)

    # and enter level 2 again
    (?&balanced_brackets)*?
    1\(
  )*?
)                                      # - - - - - - - - - - - -

# we are on level 2 => consume balanced brackets and match "3("
(?&balanced_brackets)*?
3\K\(  # also reset the start of the match
~x

Replacement 替代

(5()

Text 文本

Input:
1(8()3(6()7())9()3())2(4())3()1(0()3())

Output:
1(8()3(5()6()7())9()3(5()))2(4())3()1(0()3(5()))
       ^^^            ^^^                  ^^^
       [1]            [2]                  [3]

regex101 demo regex101演示

How it works 这个怎么运作

We start by using a conditional subpattern to distinguish between: 我们首先使用conditional subpattern来区分：

the first match attempt (from level 1) and 第一次比赛尝试（从级别1开始）和
the successive attempts (starting at level 3, anchored with the \\G assertion ). 连续尝试（从第3级开始，以\\G assertion锚）。

(?(?=\A)  # IF followed by start of string
    # This is the first attempt
|         # ELSE
    # This is another attempt
    \G    # and we'll anchor it to the end of last match
)

For the first match , we'll consume all nested brackets that don't match 1( , in order to get the cursor to a position in the first level where it could find a successful match. 对于第一个匹配项 ，我们将使用所有不匹配1(嵌套括号，以便将光标移到第一级可以找到成功匹配项的位置。

This is a well-known recursive pattern to match nested constructs. 这是匹配嵌套构造的众所周知的递归模式。 If you're unfamiliar with it, please refer to Recursion and Subroutines . 如果您不熟悉它，请参阅Recursion和子Subroutines 。

(?<balanced_brackets>        # ANY NUMBER OF BALANCED BRACKETS
  [^()]*+                    # match any characters 
  \(                         # opening bracket
    (?&balanced_brackets)*?  #  with nested bracket (recursively)
  \)                         # closing bracket in the main level
)*?                          # Repeated any times (lazy)

Notice this is a named group that we will use as a subroutine call many times in the pattern to consume unwanted balanced brackets, as (?&balanced_brackets)*? 注意，这是一个named group ，我们将在模式中将其多次用作子例程调用，以消耗不需要的平衡括号，例如(?&balanced_brackets)*? . 。

Next levels . 下一级 。 Now, to enter level 2, we need to match: 现在，要进入级别2，我们需要匹配：

1\(

And finally, we'll consume any balanced brackets until we find the opening of the 3rd level: 最后，我们将消耗所有平衡的括号，直到找到第3级的开头：

(?&balanced_brackets)*?
3\(

That's it. 而已。 We've just matched our first occurrence, so we can insert the replacement text in that position. 我们刚刚匹配了第一个匹配项，因此我们可以在该位置插入替换文本。

Next match . 下一场比赛 。 For the successive match attempts, we can either: 对于连续的匹配尝试，我们可以：

go down to level 2 matching a closing ) to find another occurrence of 3( 下降到与关闭匹配的第2级)以查找另一次出现3(
go further down to level 1 matching 2 closing ) and, from there, match using the same strategy as we used for the first match. 进一步下降到1级，匹配2个close ) ，然后从那里匹配与第一个匹配相同的策略。

This is achieved with the following subpattern: 这可以通过以下子模式实现：

\G                             # anchored to the end of last match (level 3)
(?&balanced_brackets)*?        # consume any balanced brackets
\)                             # go down to level 2
                               #
(?>                            # And optionally
  (?&balanced_brackets)*?      #   consume level 2 brackets
  \)                           #   to go down to level 1
  (?&balanced_brackets)*?      #   consume level 1 brackets
  1\(                          #   and go up to level 2 again
)*?                            # As many times as it needs to (lazy)

To conclude , we can match the opening of the 3rd level: 总结一下，我们可以匹配第三个级别的开头：

(?&balanced_brackets)*?
3\(

We'll also reset the start of match near the end of the pattern, with \\K , to only match the last opening bracket. 我们还将在模式结尾附近使用\\K 重置比赛开始，以仅匹配最后一个左括号。 Thus, we can simply replace with (5() , avoiding the use of backreferences. 因此，我们可以简单地用(5()代替，避免使用反向引用。

PHP Code PHP代码

We only need to call preg_replace() with the same values used above. 我们只需要使用上面使用的相同值调用preg_replace() 。

Ideone demo Ideone演示

Why did your regex fail? 为什么您的正则表达式失败？

Since you asked, the pattern is anchored to the start of string. 如您所问，该模式已锚定到字符串的开头。 It can only match the first occurrence. 它只能匹配第一个匹配项。

/^(   (\((((?>[^()]+)|(?R))*)\))*   1\(   (\((((?>[^()]+)|(?R))*)\))*   3\()/x

Also, it doesn't match the first occurrence because the construct (?R) recurses the the whole pattern (trying to match ^ again). 而且，它不匹配第一次出现，因为构造(?R)递归了整个模式（试图再次匹配^ ）。 We could change (?R) to (?2) . 我们可以将(?R)更改为(?2) 。

The main reason, though, is because it is not consuming the characters before any opening \\( . For example: 但是，主要原因是因为它在任何打开\\(之前都没有消耗字符。例如：

Input:
1(8()3(6()7())9()3())2(4())3()1(0()3())
  ^
  #this "8" can't be consumed with the pattern

There's also a behaviour that should be considered: PCRE treats recursion as atomic . 还应考虑一种行为： PCRE将递归视为atomic 。 So you have to make sure that the pattern will consume unwanted brackets as in the above example, but also avoid matching 1( or 3( in their respective levels. 因此，您必须确保模式会像上面的示例一样使用不需要的括号，但也要避免在各自的级别匹配1(或3( 。

Answer 2

I'd break down this problem into two smaller parts: 我将这个问题分解为两个较小的部分：

First, extract the 1 nodes, using the following regex: 首先，使用以下正则表达式提取1节点：

(?(DEFINE)
  (?<tree>
    (?: \d+ \( (?&tree) \) )*
  )
)
\b 1 \( (?&tree) \)

Demo 演示

Use preg_replace_callback for this. 为此使用preg_replace_callback 。 This will match 1(8()3(6()7())9()3()) and 1(0()3()) . 这将匹配1(8()3(6()7())9()3())和1(0()3()) 。

Next, it's just a matter of replacing 3( with 3(5() and you're done. 接下来，只需要用3(5()替换3( 3(5()就可以了。

Example in PHP: PHP中的示例：

$path = '1(8()3(6()7())9()3())2(4())3()1(0()3())';

$path = preg_replace_callback('#
    (?(DEFINE)
      (?<tree>
        (?: \d+ \( (?&tree) \) )*
      )
    )
    \b 1 \( (?&tree) \)
#x', function($m) {
    return str_replace('3(', '3(5()', $m[0]);
}, $path);

The result is: 1(8()3(5()6()7())9()3(5()))2(4())3()1(0()3(5())) 结果是： 1(8()3(5()6()7())9()3(5()))2(4())3()1(0()3(5()))

PHP PCRE允许在字符串中嵌套模式（递归）

问题描述

2 个解决方案

解决方案1
2 2015-11-14 22:29:43

Solution 解

How it works 这个怎么运作

PHP Code PHP代码

解决方案2
1 2015-11-14 19:00:03

PHP PCRE允许在字符串中嵌套模式（递归）

问题描述

2 个解决方案

解决方案1 2 2015-11-14 22:29:43

Solution 解

How it works 这个怎么运作

PHP Code PHP代码

解决方案2 1 2015-11-14 19:00:03

解决方案1
2 2015-11-14 22:29:43

解决方案2
1 2015-11-14 19:00:03