REGEX PCRE用于嵌套文本匹配的递归表达式

Question

So i am trying to do something like this (yes, including newlines): 所以我正在尝试做这样的事情（是的，包括换行符）：

Match #1 第一场比赛

START
    START
        stuff
    STOP
    more stuff
STOP

Match #2 比赛＃2

START
    START
        stuff
    STOP
    more stuff
STOP

This is how far i have come 这就是我走了多远

START(.*?^(?:(?!STOP).)*$|(?R))|STOP with the parameters "g" "m" "i" and "s" START(.*?^(?:(?!STOP).)*$|(?R))|STOP ，参数为“ g”，“ m”，“ i”和“ s”

The problem is that i cannot match anything after the STOP wihtout matching the last "STOP" in the entire text. 问题是在STOP匹配整个文本中的最后一个“ STOP”之后，我无法匹配任何内容。

Here is a regex101 example 这是一个regex101示例

https://regex101.com/r/vD4nX6/1 https://regex101.com/r/vD4nX6/1

I would appriciate some guidance 我会请一些指导

Thanks in advance 提前致谢

Answer 1

Here's a pattern that matches your example: 这是与您的示例匹配的模式：

^\h*START\h*\n(?:\h*+(?!(?:START|STOP)\h*$)[^\n]*\n|(?R)\n)*\h*STOP\h*$

using the /mg flags (live at https://regex101.com/r/iK9tK5/1 ). 使用/mg标志（位于https://regex101.com/r/iK9tK5/1 ）。

The idea behind it: 其背后的想法是：

^                                  # beginning of line
\h* START \h* \n                   # "START" optionally surrounded by horizontal whitespace
                                   #   on a line of its own
(?:                                # between START/STOP, every line is either "normal"
                                   #   or a recursive START/STOP block
    \h*+                           # a normal line starts with optional horizontal whitespace
    (?!                            #   ... not followed by ...
        (?: START | STOP ) \h* $   #   "START" or "STOP" on their own
    )
    [^\n]* \n                      # any characters, then a newline
|
    (?R) \n                        # otherwise it's a recursive START/STOP block
)*                                 # we can have as many items as we want between START/STOP
\h* STOP \h*                       # "STOP" optionally surrounded by horizontal whitespace
$                                  # end of line

I've made \\h*+ possessive in order to avoid accidentally matching " STOP" by 0 iterations of \\h* , not followed by "STOP" (they're followed by " STOP" (with a space)). 为了避免通过\\h*的0次迭代意外匹配" STOP" ，而不是"STOP" （它们后面是" STOP" （带空格）），我已将\\h*+变为所有格。 The + forces \\h to match as many times as it possibly can, so it has to consume the space. +强制\\h尽可能匹配多次，因此必须占用空间。

Alternatively you could pull \\h* into the look-ahead: (?!\\h*(?:START|STOP)\\h*$) 或者，您可以将\\h*拉到前瞻中： (?!\\h*(?:START|STOP)\\h*$)
That would also work, but then the look-ahead would skip over any spaces to see whether they're followed by START/STOP, only to have [^\\n]* outside go over those same spaces again. 那也可以，但是先行搜索将跳过任何空格以查看是否跟随着START / STOP，只是让[^\\n]*外面的空格再次经过这些空格。 With \\h*+ at the start, we match those spaces once, with no backtracking. 开头为\\h*+ ，我们将这些空格匹配一次，并且不会回溯。 I guess it's a micro-optimization. 我猜这是微观优化。

REGEX PCRE用于嵌套文本匹配的递归表达式

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-06-26 22:03:55

REGEX PCRE用于嵌套文本匹配的递归表达式

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-06-26 22:03:55

解决方案1
3 已采纳 2016-06-26 22:03:55