[英]A multi-line, variedly greedy, regular expression
Given the following text, what PCRE regular expression would you use to extract the parts marked in bold? 鉴于以下文本,您将使用什么PCRE正则表达式来提取以粗体标记的部分?
00:20314 lorem ipsum want this kryptonite 00:02314 quux padding dont want this 00:03124 foo neither this 00:01324 foo but we want this stalagmite 00:02134 tralala not this 00:03124 bar foo and we want this kryptonite but not this(!) 00:02134 foo bar and not this either 00:01234 dolor sit amet EOF
IOW, we want to extract sections that start, in regex terms, with "^0" and end with "(kryptonite|stalagmite)". IOW,我们想用正则表达式提取以“^ 0”开头并以“(kryptonite | stalagmite)”结尾的部分。
Been chomping on this for a bit, finding it a hard nut to crack. 一直在咀嚼这一点,发现它很难破解。 TIA! TIA!
One way to do this would be Negative Lookahead combined with inline (?sm)
dotall and multi-line modifiers . 一种方法是将Negative Lookahead与内联(?sm)
dotall和多行修饰符结合使用 。
(?sm)^0(?:(?!^0).)*?(?:kryptonite|stalagmite)
This looks like it works. 这看起来很有效。
# (?ms)^0(?:(?!(?:^0|kryptonite|stalagmite)).)*(kryptonite|stalagmite)
(?ms)
^ 0
(?:
(?!
(?: ^ 0 | kryptonite | stalagmite )
)
.
)*
( kryptonite | stalagmite )
I believe this will be the most efficient: 我相信这将是最有效的:
^0(?:\R(?!\R)|.)*?\b(?:kryptonite|stalagmite)\b
Obviously we start with ^0
and then end with either kryptonite
or stalagmite
(in a non-capturing group, for the heck of it) surrounded by \\b
word boundaries . 显然,我们从^0
开始,然后以kryptonite
或stalagmite
(在非捕获组中,对于它来说)以\\b
字边界包围。
(?:\\R(?!\\R)|.)*?
is the interesting part though, so let's break it down. 虽然是有趣的部分,所以让我们分解它。 One key concept first is PCRE's \\R
newline sequence . 一个关键概念首先是PCRE的\\R
换行序列 。
(?: (?# start non-capturing group for repetition)
\R (?# match a newline character)
(?!\R) (?# not followed by another newline)
| (?# OR)
. (?# match any character, except newline)
)*? (?# lazily repeat this group)
具有s修饰符的^(00:。*?(kryptonite | stalagmite))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.