简体   繁体   English

正则表达式模式在花括号之间获取字符串

[英]Regex pattern to get string between curly braces

I have a string The quick brown {fox, dragon, dinosaur} jumps over the lazy {dog, cat, bear, {lion, tiger}}. 我有一个字符串The quick brown {fox, dragon, dinosaur} jumps over the lazy {dog, cat, bear, {lion, tiger}}.

I want to get all string that are in between on curly braces. 我想得到大括号之间的所有字符串。 Curly braces inside curly braces must be ignored. 必须忽略花括号内的花括号。 The expected output in PHP array would be PHP数组中的预期输出将是

[0] => fox, dragon, dinosaur
[1] => dog, cat, bear, {lion, tiger}

I tried this pattern \\{([\\s\\S]*)\\} from Regex pattern extract string between curly braces and exclude curly braces answered by Mar but it seems this pattern get all string between curly braces without splitting non-related text (not sure the right word to use). 我尝试了这种模式\\{([\\s\\S]*)\\}来自花花括号之间的正则表达式模式提取字符串,并排除 Mar回答的花括号 ,但似乎这个模式在花括号之间得到所有字符串而不分割不相关的文本(不确定正确的用词)。 Here is the output of the pattern above 这是上面模式的输出

fox, jumps, over} over the lazy {dog, cat, bear, {lion, tiger}}

What is the best regex pattern to print the expected output from the sentence above? 打印上述句子的预期输出的最佳正则表达式模式是什么?

You can use this recursive regex pattern in PHP: 您可以在PHP中使用此递归正则表达式模式:

$re = '/( { ( (?: [^{}]* | (?1) )* ) } )/x'; 
$str = "The quick brown {fox, dragon, dinosaur} jumps over the lazy {dog, cat, bear, {lion, tiger}}."; 

preg_match_all($re, $str, $matches);
print_r($matches[2]);

RegEx Demo RegEx演示

As anubhava said, you can use a recursive pattern to do that. 正如anubhava所说,你可以使用递归模式来做到这一点。

However, his version is pretty "slow", and doesn't cover all cases. 但是,他的版本非常“慢”,并未涵盖所有情况。

I'd personnaly use this regex: 我个人使用这个正则表达式:

#({(?>[^{}]|(?0))*?})#

As you can see there: http://lumadis.be/regex/test_regex.php?id=2516 it is a -lot- faster; 正如你在那里看到的那样: http ://lumadis.be/regex/test_regex.php?id = 2516它的速度更快; and matches more results. 并匹配更多结果。

So, how does it work? 那么它是怎样工作的?

/
  (              # capturing group
    {            # looks for the char '{'
    (?>          # atomic group, engine will never backtrack his choice
        [^{}]    #   looks for a non-'{}' char
      |          # or
        (?0)     #   re-run the regex in a subroutine to match a subgroup
    )*?          # and does it as many time as needed
    }            # looks for the char '}'
  )              # ends the capture
/x

Why did I use "*?" 为什么我用“*?”

Adding the '?' 添加'?' to '*' makes it non-greedy. '*'让它变得非贪婪。 If you use a greedy quantifier there, the engine will start way more subroutine than it would with an ungreedy's one. 如果你在那里使用贪婪的量词,那么引擎将启动比使用ungreedy的子程序更多的子程序。 (If you need more explanation, let me know) (如果您需要更多解释,请告诉我)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM