简体   繁体   English

是否有类似Regex的功能,可以解析匹配的符号?

[英]Is there a Regex-like that is capable of parsing matching symbols?

This regular expression 这个正则表达式

/\(.*\)/

won't match the matching parenthesis but the last parenthesis in the string. 将不匹配匹配的括号,而是字符串中的最后一个括号。 Is there a regular expression extension, or something similar, with a proper syntax that allows for this? 是否有正则表达式扩展名或类似的扩展名带有适当的语法允许这样做? For example: 例如:

there are (many (things (on) the)) box (except (carrots (and apples)))

/OPEN(.*CLOSE)/ should match (many (things (on) the)) /OPEN(.*CLOSE)/应该匹配(many (things (on) the))

There could be infinite levels of parentheses. 可能有无限多个括号。

If you only have one level of parentheses, then there are two possibilities. 如果只有一个括号,那么有两种可能性。

Option 1: use ungreedy repetition: 选项1:使用不愉快的重复:

/\(.*?\)/

This will stop when it encounters the first ) . 当它遇到的第一个这样做会停止)

Option 2: use a negative character class 选项2:使用否定字符类

/\([^)]*\)/

This can only repeat characters that are not ) , so it can necessarily never go past the first closing parenthesis. 这只能重复不包含)字符,因此它一定不能超过第一个结束括号。 This option is usually preferred due to performance reasons. 由于性能原因,通常首选此选项。 In addition, this option is more easily extended to allow for escaping parenthesis (so that you could match this complete string: (some\\)thing) instead of throwing away thing) ). 另外,更容易扩展此选项以允许转义括号(以便您可以匹配完整的字符串: (some\\)thing)而不是扔掉thing) )。 But this is probably rather rarely necessary. 但这可能很少需要。

However if you want nested structures, this is generally too complicated for regex (although some flavors like PCRE support recursive patterns). 但是,如果您需要嵌套结构,则对于正则表达式来说通常太复杂了(尽管某些类似PCRE的功能支持递归模式)。 In this case, you should just go through the string yourself and count parentheses, to keep track of your current nesting level. 在这种情况下,您应该自己遍历字符串并计算括号,以跟踪当前的嵌套级别。

Just as a side note about those recursive patterns: In PCRE (?R) simply represents the whole pattern, so inserting this somewhere makes the whole thing recursive. 就像有关这些递归模式的旁注一样:在PCRE (?R)仅表示整个模式,因此将其插入某个位置可使整个事情递归。 But then every content of parentheses must be of the same structure as the whole match. 但是,括号的每个内容都必须与整个匹配项具有相同的结构。 Also, it is not really possible to do meaningful one-step replacements with this, as well as using capturing groups on multiple nested levels. 同样,用此方法进行有意义的一步替换以及使用多个嵌套级别上的捕获组实际上是不可能的。 All in all - you are best off, not to use regular expressions for nested structures. 总而言之,您最好不要对嵌套结构使用正则表达式。

Update: Since you seem eager to find a regex solution, here is how you would match your example using PCRE (example implementation in PHP): 更新:由于您似乎渴望找到一个正则表达式解决方案,因此这是使用PCRE(PHP中的示例实现)来匹配示例的方式:

$str = 'there are (many (things (on) the)) box (except (carrots (and apples)))';
preg_match_all('/\([^()]*(?:(?R)[^()]*)*\)/', $str, $matches);
print_r($matches);

results in 结果是

Array
(
    [0] => Array
        (
            [0] => (many (things (on) the))
            [1] => (except (carrots (and apples)))
        )   
)

What the pattern does: 模式的作用是:

\(      # opening bracket
[^()]*  # arbitrarily many non-bracket characters
(?:     # start a non-capturing group for later repetition
(?R)    # recursion! (match any nested brackets)
[^()]*  # arbitrarily many non-bracket characters
)*      # close the group and repeat it arbitrarily many times
\)      # closing bracket

This allows for infinite nested levels and also for infinite parallel levels. 这允许无限的嵌套级别以及无限的并行级别。

Note that it is not possible to get all nested levels as separate captured groups. 请注意,不可能将所有嵌套级别作为单独的捕获组。 You will always just get the inner-most or outer-most group. 您将始终只获得最内部或最外部的组。 Also, doing a recursive replacement is not possible like this. 同样,不可能像这样进行递归替换。

Regular expressions are not powerful enough to find matching parentheses, because parentheses are nested structures. 正则表达式的功能不足以找到匹配的括号,因为括号是嵌套结构。 There exists a simple algorithm to find matching parentheses, though, which is described in this answer . 但是,存在一种简单的算法来找到匹配的括号, 此答案对此进行了描述。

If you are just trying to find the first right parenthesis in an expression, you should use a non-greedy matcher in your regex. 如果您只是想在表达式中找到第一个右括号,则应在正则表达式中使用非贪婪的匹配器。 In this case, the non-greedy version of your regex is the following: 在这种情况下,正则表达式的非贪婪版本如下:

/\(.*?\)/

Given a string containing nested matching parentheses, you can either match the innermost sets with this (non-recursive JavaScript) regex: 给定一个包含嵌套匹配括号的字符串,您可以使用此(非递归JavaScript)正则表达式匹配最里面的集合:

var re = /\([^()]*\)/g;

Or you can match the outermost sets with this (recursive PHP) regex: 或者,您可以使用此(递归PHP)正则表达式匹配最外面的集合:

$re = '/\((?:[^()]++|(?R))*\)/';

But you cannot easily match sets of matching parentheses that are in-between the innermost and outermost. 但是,您无法轻松地匹配最里面和最外面之间的匹配括号集。

Note also that the (naive and frequently encountered) expression: /\\(.*?\\)/ will always match incorrectly (neither the innermost nor outermost matched sets). 还请注意,(幼稚且经常遇到的)表达式:/ /\\(.*?\\)/ (.*? /\\(.*?\\)/ )/ /\\(.*?\\)/将不正确地匹配(无论是最里面的匹配集还是最外面的匹配集)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM