如何在PHP中使用正则表达式匹配嵌套大括号？

Question

I have an LaTeX document I want to match. 我有一个我想要匹配的LaTeX文档。 And I need a RegEx match that matches the following: 我需要一个符合以下条件的RegEx匹配：

\ # the backslash in the beginning
[a-zA-Z]+ #a word
(\{.+\})* # any amount of {something}

However, and her is the catch; 然而，她是抓住了;

In the last line, it 1. needs to be greedy and 2. needs to have a matching number of {} inside itself. 在最后一行中，它需要贪婪，并且2.需要在其内部具有匹配的数量{} 。

Meaning if I have the string \\test{something\\somthing{9}} it would match the whole. 这意味着如果我有字符串\\test{something\\somthing{9}}它将匹配整个。 And it needs to be in that order ( {} ). 它需要按顺序（ {} ）。 So that it doesn't match the following: 因此它与以下内容不匹配：

\\LaTeX{} is a document preparation system for the \\TeX{} \\ LaTeX {}是\\ TeX {}的文档准备系统

just 只是

\\LaTeX{} \\胶乳{}

and 和

\\TeX{} \\ TeX的{}

Help anyone? 帮助任何人？ Maybe someone have an better idea for matching? 也许有人有更好的匹配想法？ Should I not use regular expressions? 我不应该使用正则表达式吗？

Answer 1

This can be done with recursion: 这可以通过递归来完成：

$input = "\LaTeX{} is a document preparation system for the \TeX{}
\latex{something\somthing{9}}";

preg_match_all('~(?<token>
        \\\\ # the slash in the beginning
        [a-zA-Z]+ #a word
        (\{[^{}]*((?P>token)[^{}]*)?\}) # {something}
)~x', $input, $matches);

This correctly matches \\LaTeX{} , \\TeX{} , and \\latex{something\\somthing{9}} 这正确匹配\\LaTeX{} ， \\TeX{}和\\latex{something\\somthing{9}}

Answer 2

PHP could be used since it supports recursive regex-matching. 可以使用PHP ，因为它支持递归正则表达式匹配。 But , as I said, if you have comments in your LaTeX-like strings that can have { or } in them, this will fail. 但是，正如我所说的，如果你的类似LaTeX的字符串中有注释可以包含{或} ，那么这将失败。

A demo: 演示：

$text = 'This is a \LaTeX{ foo { bar { ... } baz test {} done } } document
preparation system for the \TeX{a{b{c}d}e{f}g{h}i}-y people out there';
preg_match_all('/\\\\[A-Za-z]+(\{(?:[^{}]|(?1))*})/', $text, $matches, PREG_SET_ORDER);
print_r($matches);

which produces: 产生：

Array
(
    [0] => Array
        (
            [0] => \LaTeX{ foo { bar { ... } baz test {} done } }
            [1] => { foo { bar { ... } baz test {} done } }
        )

    [1] => Array
        (
            [0] => \TeX{a{b{c}d}e{f}g{h}i}
            [1] => {a{b{c}d}e{f}g{h}i}
        )

)

A quick explanation: 快速解释：

\\\\         # the literal '\'
[A-Za-z]+    # one or more letters
(            # start capture group 1   <-----------------+
  \{         #   the literal '{'                         |
  (?:        #   start non-capture group A               |
    [^{}]    #     any character other than '{' and '}'  |
    |        #     OR                                    |
    (?1)     #     recursively match capture group 1  ---+
  )          #   end non-capture group A
  *          #   non-capture group A zero or more times
  }          #   the literal '}'
)            # end capture group 1

Answer 3

Unfortunately, I believe this is impossible. 不幸的是，我认为这是不可能的。 Bracket matching (detecting properly paired, nested brackets) is commonly used as an example of a problem that cannot be solved with a finite state machine, such as a regular expression parser. 支架匹配（检测正确配对的嵌套括号）通常用作有限状态机无法解决的问题的示例，例如正则表达式解析器。 You could do it with a context free grammar, but that's just not how regex works. 您可以使用无上下文语法来完成它，但这不是正则表达式的工作原理。 Your best solution is to use a regex like {*[^{}]*}* for the initial check, and then another short script to check whether it's an even number. 您最好的解决方案是使用正则表达式{*[^{}]*}*进行初始检查，然后使用另一个短脚本来检查它是否为偶数。

In conclusion: don't try and do it with only regex. 总之：不要只用正则表达式来做。 This is not a problem that can be solved with regex alone. 这不是单独用正则表达式解决的问题。

如何在PHP中使用正则表达式匹配嵌套大括号？

问题描述

3 个解决方案

解决方案1
2 已采纳 2011-01-21 13:14:39

解决方案2
2 2011-01-21 13:22:09

解决方案3
-1 2011-01-21 13:08:59

如何在PHP中使用正则表达式匹配嵌套大括号？

问题描述

3 个解决方案

解决方案1 2 已采纳 2011-01-21 13:14:39

解决方案2 2 2011-01-21 13:22:09

解决方案3 -1 2011-01-21 13:08:59

解决方案1
2 已采纳 2011-01-21 13:14:39

解决方案2
2 2011-01-21 13:22:09

解决方案3
-1 2011-01-21 13:08:59