简体   繁体   English

带有模式的递归PCRE搜索

[英]Recursive PCRE search with patterns

This question has to do with PCRE . 这个问题与PCRE有关。

I have seen a recursive search for nested parentheses used with this construct: 我看到了递归搜索此结构使用的嵌套括号:

\(((?>[^()]+)|(?R))*\)

The problem with this is that, while the ' [^()]+ ' can match any character including newline, you are forced to match only single-character characters, such as braces, brackets, punctuation, single letters, etc. 问题是,虽然[[^()] + '可以匹配包括换行符在内的任何字符,但是您只能匹配单个字符,例如大括号,方括号,标点符号,单个字母等。

What I am trying to do is replace the '(' and ')' characters with ANY kind of pattern (keywords such as 'BEGIN' and 'END', for example). 我正在尝试用任何一种模式(例如,诸如“ BEGIN”和“ END”之类的关键字)替换“(”和“)”字符。

I have come up with the following construct: 我想出了以下结构:

(?xs)  (?# <-- 'xs' ignore whitespace in the search term, and allows '.'
               to match newline )
(?P<pattern1>BEGIN)
(
   (?> (?# <-- "once only" search )
      (
         (?! (?P=pattern1) | (?P<pattern2>END)).
      )+
   )
   | (?R)
)*
END

This will actually work on something that looks like this: 实际上,这将适用于如下所示的内容:

BEGIN <<date>>
  <<something>
    BEGIN
      <<something>>
    END <<comment>>
    BEGIN <<time>>
      <<more somethings>>
      BEGIN(cause we can)END
      BEGINEND
    END
  <<something else>>
END

This successfully matches any nested BEGIN..END pairs. 这可以成功匹配任何嵌套的BEGIN..END对。

I set up named patterns pattern1 and pattern2 for BEGIN and END , respectively. 我分别为BEGINEND设置了命名模式pattern1pattern2 Using pattern1 in the search term works fine. 在搜索词中使用pattern1可以正常工作。 However, I can't use pattern2 at the end of the search: I have to write out ' END '. 但是,我不能在搜索结束时使用pattern2 :我必须写出' END '。

Any idea how I can rewrite this regex so I only have to specify the patterns a single time and use them "everywhere" within the code? 知道如何重写该正则表达式,这样我只需一次指定模式并在代码中“无处不在”使用它们即可吗? In other words, so I don't have to write END both in the middle of the search as well as at the very end. 换句话说,因此我不必在搜索的中间和结尾都写END

To further extend on @Kobis answer, please see the following regex: 要进一步扩展@Kobis答案,请参见以下正则表达式:

(?xs)
(?(DEFINE)
        (?<pattern1>BEGIN)
        (?<pattern2>END)
)
(?=((?&pattern1)
(?:
   (?> (?# <-- "once only" search )
      (?:
         (?! (?&pattern1) | (?&pattern2)) .
      )+
   )*
   | (?3)
)*
(?&pattern2)
))

This regex will allow you to even fetch the data for each individual data block! 这个正则表达式将允许您甚至为每个单独的数据块获取数据! Use the 3rd backreference, as the first two have been defined in the define block. 使用第三个后向引用,因为前两个已在define块中定义。

Demo: http://regex101.com/r/bX8mB6 演示: http//regex101.com/r/bX8mB6

This looks like a good use case for a (?(DEFINE)) block, which is used to create such constructs. 对于(?(DEFINE))块,这似乎是一个好用例,该块用于创建此类构造。 A Perl example would be: 一个Perl的例子是:

(?xs)
(?(DEFINE)
        (?<pattern1>BEGIN)
        (?<pattern2>END)
)
(?&pattern1)
(
   (?> (?# <-- "once only" search )
      (
         (?! (?&pattern1) | (?&pattern2)).
      )+
   )
   | (?R)
)*
(?&pattern2)

Example: http://ideone.com/8o9cg 范例: http//ideone.com/8o9cg

(please note I don't really know any perl, and couldn't get it to work on PHP on any of the online testers) (请注意,我真的不了解任何perl,也无法在任何在线测试仪的PHP上使用它)

See also: http://www.pcre.org/pcre.txt (look for (?(DEFINE) 0 it doesn't look like they have pages) 另请参阅: http : //www.pcre.org/pcre.txt (查找(?(DEFINE) 0,看起来好像没有页面)


A low-tech solution that works on most flavors is to use lookahead at the start of the pattern: 适用于大多数口味的低技术解决方案是在模式开始时先行使用:

(?=.*?(?P<pattern1>BEGIN))
(?=.*?(?P<pattern2>END))
...
(?P=pattern1) (?# should work - it was captured )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM