简体   繁体   English

用于匹配括号中的所有内容的递归正则表达式(PCRE)

[英]Recursive regex for matching everything in parenthesis (PCRE)

I am surprised to not easily find a similar question with an answer on SO. 我很惊讶,不容易找到一个类似的问题与答案的SO。 I would like to match everything in some functions. 我想在某些功能中匹配所有内容。 The idea is to remove the functions which are useless. 想法是删除无用的功能。

foo(some (content)) --> some (content)

So I am trying to match everything in the function call which can include parenthesis. 所以我试图匹配函数调用中的所有内容,包括括号。 Here is my PCRE regex: 这是我的PCRE正则表达式:

(?<name>\w+)\s*\(\K
(?<e>
     [^()]+
     |
     [^()]*
         \((?&e)\)
     [^()]*
)*
(?=\))

https://regex101.com/r/gfMAIM/1 https://regex101.com/r/gfMAIM/1

Unfortunately it doesn't work and I don't really understand why. 不幸的是它不起作用,我不明白为什么。

Your Group e pattern does not do the right job, currently, it matches parentheses with 1 depth level as you only recursed the e pattern once. 您的组e模式没有正确的工作,目前,它匹配1个深度级别的括号,因为您只复制了一次e模式。 It needs to match as many (...) substrings as there are present, and thus, the subroutine pattern needs to be inside a * or + quantified group, and it can even be "simplified" to (?<e>[^()]*(?:\\((?&e)\\)[^()]*)*) . 它需要匹配尽可能多的(...)子串,因此,子程序模式需要在*+量化组内,甚至可以“简化”为(?<e>[^()]*(?:\\((?&e)\\)[^()]*)*)

Note that your Group e pattern is equal to (?<e>[^()]+|\\((?&e)\\))* . 请注意,您的组e模式等于(?<e>[^()]+|\\((?&e)\\))* [^()]* around \\((?&e)\\) are redundant since the [^()]+ alternative will consume the chars other than ( and ) on the current depth level. [^()]*围绕\\((?&e)\\)是多余的,因为[^()]+替代将消耗当前深度级别上的()之外的字符。

Also, you quantified the Group e pattern making it a repeated capturing group that only keeps the text matched during the last iteration. 此外,您量化了组e模式,使其成为重复捕获组 ,仅在最后一次迭代期间保持文本匹配。

You may use 你可以用

(?<name>\w+)\s*\(\K(?<e>[^()]*(?:\((?&e)\)[^()]*)*)(?=\))

See the regex demo 请参阅正则表达式演示

Details 细节

  • (?<name>\\w+)\\s*\\(\\K - 1+ word chars, 0+ whitespaces and ( that are omitted from the match (?<name>\\w+)\\s*\\(\\K - 1+个字符,0 +空格和(从匹配中省略)
  • (?<e> - start of Group e (?<e> - 小组e开始
    • [^()]* - 0+ chars other than ( and ) [^()]* - 除了()以外的0+个字符
    • (?: - start of a non-capturing group: (?: - 非捕获组的开始:
      • \\( - a ( char \\( - a ( char
      • (?&e) - Group e pattern recursed (?&e) - 递归组e模式
      • \\) - a ) \\) - a )
      • [^()]* - 0+ chars other than ( and ) [^()]* - 除了()以外的0+个字符
    • )* - 0 or more repetitions )* - 重复0次或更多次
  • ) - end of e group ) - e组结束
  • (?=\\)) - a ) must be immediately to the right of the current location. (?=\\)) - a )必须立即在当前位置的右侧。

The following regex does the matching without taking extra steps: 以下正则表达式进行匹配而不采取额外步骤:

(?<name>\w+)\s*(\((?<e>([^()]*+|(?2))+)\))

See live demo here 在这里查看现场演示

But that doesn't match following strings that contain unbalanced parentheses in a quoted string: 但是,这与在带引号的字符串中包含不平衡括号的后续字符串不匹配:

  • foo(bar = ')')
  • foo(bar(john = "(Doe..."))

So what you should look for is: 所以你应该寻找的是:

(?<name>\w+)\s*(\((?<e>([^()'"]*+|"(?>[^"\\]*+|\\.)*"|'(?>[^'\\]*+|\\.)*'|(?2))+)\))

See live demo here 在这里查看现场演示

Regex breakdown: 正则表达式细分:

  • (?<name>\\w+)\\s* Match function name and trailing spaces (?<name>\\w+)\\s*匹配函数名称和尾随空格
  • ( Start of a cluster (群集的开始
    • \\( Match a literal ( \\(匹配文字(
    • (?<e> Start of named capturing group e (?<e>命名捕获组e
      • ( Start of capturing group #2 (开始捕获组#2
        • [^()'"]*+ Match any thing except ()'" [^()'"]*+匹配除()'"之外的任何东西
        • | Or 要么
        • "(?>[^"\\\\]*+|\\\\.)*" Match any thing between double quotes "(?>[^"\\\\]*+|\\\\.)*"匹配双引号之间的任何内容
        • | Or 要么
        • '(?>[^'\\\\]*+|\\\\.)*' Match any thing between single quotes '(?>[^'\\\\]*+|\\\\.)*'匹配单引号之间的任何内容
        • | Or 要么
        • (?2) Recurse second capturing group (?2)递归第二个捕获组
      • )+ Repeat as much as possible, at least once )+尽可能重复,至少一次
    • ) End of capturing group )捕获组结束
    • \\) Match ) literally \\)匹配)字面意思
  • ) End of capturing group )捕获组结束

I have simple regex without recursion . 我有简单的正则表达式没有递归

(?<=[\w ]{2}\().*(?=\))

by now it deals with unbalanced perenthesis, but it does not deals with multiple functions that are on one line. 到目前为止它处理的是不平衡的perenthesis,但它不处理一行中的多个函数。 It could be handeled if you know the delmiters between the function. 如果你知道函数之间的delmiters,它可能会被handeled。 eg ; 例如; if that is Java code. 如果那是Java代码。

Variant 2 (updated for multiple functions on a row): 变体2 (针对一行中的多个函数进行了更新):

(?<=[\w ]\()[^;\n]*(?=\))

Variant 3 (allowing ; in strings): 变体3 (允许;在字符串中):

(?<=[\w ]\()([^;\n]|".*?")*(?=\))    

Variant 4 (escaping strings): 变体4 (逃避字符串):

(?<=[\w \n]\()(?:[^;\n"]|(?:"(?:[^"]|\\")*?(?<!\\)"))*(?=\))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM