[英]REGEX PCRE Recursive expression for nested text matching
So i am trying to do something like this (yes, including newlines): 所以我正在尝试做这样的事情(是的,包括换行符):
Match #1 第一场比赛
START
START
stuff
STOP
more stuff
STOP
Match #2 比赛#2
START
START
stuff
STOP
more stuff
STOP
This is how far i have come 这就是我走了多远
START(.*?^(?:(?!STOP).)*$|(?R))|STOP
with the parameters "g" "m" "i" and "s" START(.*?^(?:(?!STOP).)*$|(?R))|STOP
,参数为“ g”,“ m”,“ i”和“ s”
The problem is that i cannot match anything after the STOP
wihtout matching the last "STOP" in the entire text. 问题是在
STOP
匹配整个文本中的最后一个“ STOP”之后,我无法匹配任何内容。
Here is a regex101 example 这是一个regex101示例
https://regex101.com/r/vD4nX6/1 https://regex101.com/r/vD4nX6/1
I would appriciate some guidance 我会请一些指导
Thanks in advance 提前致谢
Here's a pattern that matches your example: 这是与您的示例匹配的模式:
^\h*START\h*\n(?:\h*+(?!(?:START|STOP)\h*$)[^\n]*\n|(?R)\n)*\h*STOP\h*$
using the /mg
flags (live at https://regex101.com/r/iK9tK5/1 ). 使用
/mg
标志(位于https://regex101.com/r/iK9tK5/1 )。
The idea behind it: 其背后的想法是:
^ # beginning of line
\h* START \h* \n # "START" optionally surrounded by horizontal whitespace
# on a line of its own
(?: # between START/STOP, every line is either "normal"
# or a recursive START/STOP block
\h*+ # a normal line starts with optional horizontal whitespace
(?! # ... not followed by ...
(?: START | STOP ) \h* $ # "START" or "STOP" on their own
)
[^\n]* \n # any characters, then a newline
|
(?R) \n # otherwise it's a recursive START/STOP block
)* # we can have as many items as we want between START/STOP
\h* STOP \h* # "STOP" optionally surrounded by horizontal whitespace
$ # end of line
I've made \\h*+
possessive in order to avoid accidentally matching " STOP"
by 0 iterations of \\h*
, not followed by "STOP"
(they're followed by " STOP"
(with a space)). 为了避免通过
\\h*
的0次迭代意外匹配" STOP"
,而不是"STOP"
(它们后面是" STOP"
(带空格)),我已将\\h*+
变为所有格。 The +
forces \\h
to match as many times as it possibly can, so it has to consume the space. +
强制\\h
尽可能匹配多次,因此必须占用空间。
Alternatively you could pull \\h*
into the look-ahead: (?!\\h*(?:START|STOP)\\h*$)
或者,您可以将
\\h*
拉到前瞻中: (?!\\h*(?:START|STOP)\\h*$)
That would also work, but then the look-ahead would skip over any spaces to see whether they're followed by START/STOP, only to have [^\\n]*
outside go over those same spaces again. 那也可以,但是先行搜索将跳过任何空格以查看是否跟随着START / STOP,只是让
[^\\n]*
外面的空格再次经过这些空格。 With \\h*+
at the start, we match those spaces once, with no backtracking. 开头为
\\h*+
,我们将这些空格匹配一次,并且不会回溯。 I guess it's a micro-optimization. 我猜这是微观优化。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.