[英]PHP PCRE allow nested patterns (recursion) in a string
I got a string like 1(8()3(6()7())9()3())2(4())3()1(0()3())
which is representing a tree. 我得到了像
1(8()3(6()7())9()3())2(4())3()1(0()3())
这样的字符串,它表示一棵树。 A bracket appears, if we go one level deeper. 如果我们更深一层,将出现一个括号。 Numbers on the same level are neighbours.
在同一级别上的数字是邻居。
Now want to add nodes, for example I want to add a 5
to every path where we have a 1
on the first and a 3
on the second level, so I want to put a 5()
after every 3(
which is inside of a 1(
. So 5()
has to be added 3 times and the result should be 1(8()3(5()6()7())9()3(5()))2(4())3()1(0()3(5()))
现在要添加节点,比如我想将增加
5
到,我们有一个每路1
对第一和3
在第二个层次,所以我想提出一个5()
每经过3(
这是内部a 1(
。因此必须将5()
相加3次,结果应为1(8()3(5()6()7())9()3(5()))2(4())3()1(0()3(5()))
The Problem is, that I don't get the code working with the PCRE recursion. 问题是,我没有使用PCRE递归的代码。 If I match a tree representation string without and fixed paths like
1(
and 3(
it works, but when I generate a regex with those fixed patterns, it doesn't work. 如果我匹配不带固定路径(如
1(
和3(
的树表示字符串,它会起作用,但是当我生成具有这些固定模式的正则表达式时,它将不起作用。
This is my code: 这是我的代码:
<?php
header('content-type: text/plain;utf-8');
$node = [1, 3, 5];
$path = '1(8()3(6()7())9()3())2(4())3()1(0()3())';
echo $path.'
';
$nes = '\((((?>[^()]+)|(?R))*)\)';
$nes = '('.$nes.')*';
echo preg_match('/'.$nes.'/x', $path) ? 'matches' : 'matches not';
echo '
';
// creates a regex with the fixed path structure, but allows nested elements in between
// in this example something like: /^anyNestedElementsHere 1( anyNestedElementsHere 3( anyNestedElementsHere ))/
$re = $nes;
for ($i = 0; $i < count($node)-1; $i++) {
$re .= $node[$i].'\(';
if ($i != count($node)-2)
$re .= $nes;
}
$re = '/^('.$re.')/x';
echo str_replace($nes, ' '.$nes.' ', $re).'
';
echo preg_match($re, $path) ? 'matches' : 'matches not';
echo '
';
// append 5()
echo preg_replace($re, '${1}'.$node[count($node)-1].'()', $path);
?>
And this is the output, where you can see how the generated regex looks like: 这是输出,您可以在其中查看生成的正则表达式的样子:
1(8()3(6()7())9()3())2(4())3()1(0()3())
matches
/^( (\((((?>[^()]+)|(?R))*)\))* 1\( (\((((?>[^()]+)|(?R))*)\))* 3\()/x
matches not
1(8()3(6()7())9()3())2(4())3()1(0()3())
I hope you understand my problem and hope you can tell me, where my error is. 希望您理解我的问题,希望您能告诉我我的错误在哪里。
Thanks a lot! 非常感谢!
Regex 正则表达式
The following regex matches nested brackets recursively, finding an opening 1(
on the first level, and an opening 3(
on the second level (as a direct child). It also attempts successive matches, either on the same level or going down the respective levels to find another match. 下面的正则表达式递归匹配嵌套的括号,在第一层找到一个开口
1(
在第二层上找到一个开口3(
(作为直接子代)。它也尝试连续的匹配,无论是在同一层上还是在相应层上向下水平找到另一个匹配。
~
(?(?=\A) # IF: First match attempt (if at start of string) - -
# we are on 1st level => find next "1("
(?<balanced_brackets>
# consumes balanced brackets recursively where there is no match
[^()]*+
\( (?&balanced_brackets)*? \)
)*?
# match "1(" => enter level 2
1\(
| # ELSE: Successive matches - - - - - - - - - - - - - -
\G # Start at end of last match (level 3)
# Go down to level 2 - match ")"
(?&balanced_brackets)*?
\)
# or go back to level 1 - matching another ")"
(?>
(?&balanced_brackets)*?
\)
# and enter level 2 again
(?&balanced_brackets)*?
1\(
)*?
) # - - - - - - - - - - - -
# we are on level 2 => consume balanced brackets and match "3("
(?&balanced_brackets)*?
3\K\( # also reset the start of the match
~x
Replacement 替代
(5()
Text 文本
Input:
1(8()3(6()7())9()3())2(4())3()1(0()3())
Output:
1(8()3(5()6()7())9()3(5()))2(4())3()1(0()3(5()))
^^^ ^^^ ^^^
[1] [2] [3]
We start by using a conditional subpattern
to distinguish between: 我们首先使用
conditional subpattern
来区分:
\\G assertion
). \\G assertion
锚)。 (?(?=\A) # IF followed by start of string
# This is the first attempt
| # ELSE
# This is another attempt
\G # and we'll anchor it to the end of last match
)
For the first match , we'll consume all nested brackets that don't match 1(
, in order to get the cursor to a position in the first level where it could find a successful match. 对于第一个匹配项 ,我们将使用所有不匹配
1(
嵌套括号,以便将光标移到第一级可以找到成功匹配项的位置。
Recursion
and Subroutines
. Recursion
和子Subroutines
。 (?<balanced_brackets> # ANY NUMBER OF BALANCED BRACKETS
[^()]*+ # match any characters
\( # opening bracket
(?&balanced_brackets)*? # with nested bracket (recursively)
\) # closing bracket in the main level
)*? # Repeated any times (lazy)
Notice this is a named group
that we will use as a subroutine call many times in the pattern to consume unwanted balanced brackets, as (?&balanced_brackets)*?
注意,这是一个
named group
,我们将在模式中将其多次用作子例程调用,以消耗不需要的平衡括号,例如(?&balanced_brackets)*?
. 。
Next levels . 下一级 。 Now, to enter level 2, we need to match:
现在,要进入级别2,我们需要匹配:
1\(
And finally, we'll consume any balanced brackets until we find the opening of the 3rd level: 最后,我们将消耗所有平衡的括号,直到找到第3级的开头:
(?&balanced_brackets)*?
3\(
That's it. 而已。 We've just matched our first occurrence, so we can insert the replacement text in that position.
我们刚刚匹配了第一个匹配项,因此我们可以在该位置插入替换文本。
Next match . 下一场比赛 。 For the successive match attempts, we can either:
对于连续的匹配尝试,我们可以:
)
to find another occurrence of 3(
)
以查找另一次出现3(
)
and, from there, match using the same strategy as we used for the first match. )
,然后从那里匹配与第一个匹配相同的策略。 This is achieved with the following subpattern: 这可以通过以下子模式实现:
\G # anchored to the end of last match (level 3)
(?&balanced_brackets)*? # consume any balanced brackets
\) # go down to level 2
#
(?> # And optionally
(?&balanced_brackets)*? # consume level 2 brackets
\) # to go down to level 1
(?&balanced_brackets)*? # consume level 1 brackets
1\( # and go up to level 2 again
)*? # As many times as it needs to (lazy)
To conclude , we can match the opening of the 3rd level: 总结一下,我们可以匹配第三个级别的开头:
(?&balanced_brackets)*?
3\(
We'll also reset the start of match near the end of the pattern, with \\K
, to only match the last opening bracket. 我们还将在模式结尾附近使用
\\K
重置比赛开始 ,以仅匹配最后一个左括号。 Thus, we can simply replace with (5()
, avoiding the use of backreferences. 因此,我们可以简单地用
(5()
代替,避免使用反向引用。
We only need to call preg_replace()
with the same values used above. 我们只需要使用上面使用的相同值调用
preg_replace()
。
Why did your regex fail? 为什么您的正则表达式失败?
Since you asked, the pattern is anchored to the start of string. 如您所问,该模式已锚定到字符串的开头。 It can only match the first occurrence.
它只能匹配第一个匹配项。
/^( (\((((?>[^()]+)|(?R))*)\))* 1\( (\((((?>[^()]+)|(?R))*)\))* 3\()/x
Also, it doesn't match the first occurrence because the construct (?R)
recurses the the whole pattern (trying to match ^
again). 而且,它不匹配第一次出现,因为构造
(?R)
递归了整个模式(试图再次匹配^
)。 We could change (?R)
to (?2)
. 我们可以将
(?R)
更改为(?2)
。
The main reason, though, is because it is not consuming the characters before any opening \\(
. For example: 但是,主要原因是因为它在任何打开
\\(
之前都没有消耗字符。例如:
Input:
1(8()3(6()7())9()3())2(4())3()1(0()3())
^
#this "8" can't be consumed with the pattern
There's also a behaviour that should be considered: PCRE treats recursion as atomic . 还应考虑一种行为: PCRE将递归视为atomic 。 So you have to make sure that the pattern will consume unwanted brackets as in the above example, but also avoid matching
1(
or 3(
in their respective levels. 因此,您必须确保模式会像上面的示例一样使用不需要的括号,但也要避免在各自的级别匹配
1(
或3(
。
I'd break down this problem into two smaller parts: 我将这个问题分解为两个较小的部分:
First, extract the 1
nodes, using the following regex: 首先,使用以下正则表达式提取
1
节点:
(?(DEFINE)
(?<tree>
(?: \d+ \( (?&tree) \) )*
)
)
\b 1 \( (?&tree) \)
Use preg_replace_callback
for this. 为此使用
preg_replace_callback
。 This will match 1(8()3(6()7())9()3())
and 1(0()3())
. 这将匹配
1(8()3(6()7())9()3())
和1(0()3())
。
Next, it's just a matter of replacing 3(
with 3(5()
and you're done. 接下来,只需要用
3(5()
替换3(
3(5()
就可以了。
Example in PHP: PHP中的示例:
$path = '1(8()3(6()7())9()3())2(4())3()1(0()3())';
$path = preg_replace_callback('#
(?(DEFINE)
(?<tree>
(?: \d+ \( (?&tree) \) )*
)
)
\b 1 \( (?&tree) \)
#x', function($m) {
return str_replace('3(', '3(5()', $m[0]);
}, $path);
The result is: 1(8()3(5()6()7())9()3(5()))2(4())3()1(0()3(5()))
结果是:
1(8()3(5()6()7())9()3(5()))2(4())3()1(0()3(5()))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.