简体   繁体   English

多线,多变贪婪,正规表达

[英]A multi-line, variedly greedy, regular expression

Given the following text, what PCRE regular expression would you use to extract the parts marked in bold? 鉴于以下文本,您将使用什么PCRE正则表达式来提取以粗体标记的部分?

00:20314 lorem ipsum
  want this
  kryptonite

00:02314 quux
  padding
  dont want this

00:03124 foo
     neither this

00:01324 foo
     but we want this
     stalagmite

00:02134 tralala
     not this

00:03124 bar foo
     and we want this
     kryptonite but not this(!)

00:02134 foo bar
     and not this either

00:01234 dolor sit amet
     EOF

IOW, we want to extract sections that start, in regex terms, with "^0" and end with "(kryptonite|stalagmite)". IOW,我们想用正则表达式提取以“^ 0”开头并以“(kryptonite | stalagmite)”结尾的部分。

Been chomping on this for a bit, finding it a hard nut to crack. 一直在咀嚼这一点,发现它很难破解。 TIA! TIA!

One way to do this would be Negative Lookahead combined with inline (?sm) dotall and multi-line modifiers . 一种方法是将Negative Lookahead与内联(?sm) dotall和多行修饰符结合使用

(?sm)^0(?:(?!^0).)*?(?:kryptonite|stalagmite)

Live Demo 现场演示

This looks like it works. 这看起来很有效。

 # (?ms)^0(?:(?!(?:^0|kryptonite|stalagmite)).)*(kryptonite|stalagmite)

 (?ms)
 ^ 0
 (?:
      (?!
           (?: ^ 0 | kryptonite | stalagmite )
      )
      . 
 )*
 ( kryptonite | stalagmite )

I believe this will be the most efficient: 我相信这将是最有效的:

^0(?:\R(?!\R)|.)*?\b(?:kryptonite|stalagmite)\b

Demo 演示


Obviously we start with ^0 and then end with either kryptonite or stalagmite (in a non-capturing group, for the heck of it) surrounded by \\b word boundaries . 显然,我们从^0开始,然后以kryptonitestalagmite (在非捕获组中,对于它来说)以\\b字边界包围。

(?:\\R(?!\\R)|.)*? is the interesting part though, so let's break it down. 虽然是有趣的部分,所以让我们分解它。 One key concept first is PCRE's \\R newline sequence . 一个关键概念首先是PCRE的\\R换行序列

(?:      (?# start non-capturing group for repetition)
  \R     (?# match a newline character)
  (?!\R) (?# not followed by another newline)
 |       (?# OR)
  .      (?# match any character, except newline)
)*?      (?# lazily repeat this group)

具有s修饰符的^(00:。*?(kryptonite | stalagmite))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM