简体   繁体   English

用于解析括号和括号之间的文本的正则表达式

[英]Regex for parsing text between brackets and parenthesis

I want to create a regex that saves all of $text1 and $text2 in two separade arrays. 我想创建一个正则表达式,将所有$text1$text2在两个分离数组中。 text1 and text2 are: ($text1)[$text2] that exist in string. text1和text2是:( ($text1)[$text2] ,存在于字符串中。

I wrote this code to parse between brackets as: 我写了这段代码来解析括号:

<?php

preg_match_all("/\[[^\]]*\]/", $text, $matches);

?>

It works correctly . 它工作正常。

And I wrote another code to parse between parantheses as: 我写了另一个代码来解析parantheses:

<?php

preg_match('/\([^\)]*\)/', $text, $match);

?>

But it just parses between one of parantheses not all of the parantheses in string :( 但它只是解析其中一个parantheses而不是字符串中的所有parantheses :(

So I have two problems: 所以我有两个问题:

1) How can I parse text between all of the parantheses in the string? 1)如何解析字符串中所有parantheses之间的文本?

2) How can I reach $text1 and $text2 as i described at top? 2)如何按照我在顶部描述的那样达到$text1$text2

Please help me. 请帮我。 I am confused about regex. 我对正则表达式很困惑。 If you have a good resource share it link. 如果你有一个很好的资源共享链接。 Thanks ;) 谢谢 ;)

Use preg_match_all() with the following regular expression: preg_match_all()与以下正则表达式一起使用:

/(\[.+?\])(\(.+?\))/i

Demo 演示

Details 细节

/                   # begin pattern
    (               # first group, brackets
        \[          # literal bracket
            .+?     # any character, one or more times, greedily
        \]          # literal bracket, close
    )               # first group, close
    (               # second group, parentheses
        \(          # literal parentheses
            .+?     # any character, one or more times, greedily
        \)          # literal parentheses, close
    )               # second group, close
/i                  # end pattern

Which will save everything between brackets in one array, and everything between parentheses in another. 这将保存一个数组中括号之间的所有内容,以及另一个数组中括号之间的所有内容。 So, in PHP: 所以,在PHP中:

<?php
$s = "[test1](test2) testing the regex [test3](test4)";
preg_match_all("/(\[.+?\])(\(.+?\))/i", $s, $m);
var_dump($m[1]); // bracket group
var_dump($m[2]); // parentheses group

Demo 演示

The only reason you were failing to capture multiple ( ) wrapped substrings is because you were calling preg_match() instead of preg_match_all() . 您未能捕获多个( )包装的子字符串的唯一原因是因为您调用的是preg_match()而不是preg_match_all()

A couple of small points: 几个小点:

  1. The ) inside of your negated character class didn't need to be escaped. )的否定字符类的内部也没必要进行转义。
  2. The closing square bracket (at the end of your pattern) doesn't need to be escaped; 关闭方括号(在图案的末尾)不需要转义; regex will not mistake it to mean the end of a character class. 正则表达式不会误认为它意味着字符类的结束。
  3. There is no need to declare the i pattern modifier, you have no letters in your pattern to modify. 无需声明i模式修饰符,您的模式中没有要修改的字母。

Combine your two patterns into one and bake in my small points and you have a fully refined/optimized pattern. 将您的两种模式合二为一,烘烤我的小点,您就拥有了完全精致/优化的模式。

In case you don't know why your patterns are great, I'll explain. 如果您不知道为什么您的模式很棒,我会解释。 You see, when you ask the regex engine to match "greedily", it can move more efficiently (take less steps). 你看,当你要求正则表达式引擎匹配“贪婪”时,它可以更有效地移动(减少步骤)。

By using a negated character class, you can employ greedy matching. 通过使用否定的字符类,您可以使用贪婪匹配。 If you only use . 如果你只使用. then you have to use "lazy" matching ( *? ) to ensure that matching doesn't "go too far". 那么你必须使用“懒惰”匹配( *? )来确保匹配不会“走得太远”。

Pattern: ~\\(([^)]*)\\)\\[([^\\]]*)]~ (11 steps) 模式: ~\\(([^)]*)\\)\\[([^\\]]*)]~ (11步)

The above will capture zero or more characters between the parentheses as Capture Group #1, and zero or more characters between the square brackets as Capture Group #2. 以上将捕获括号中的零个或多个字符作为捕获组#1,并将方括号之间的零个或多个字符捕获为捕获组#2。

If you KNOW that your target strings will obey your strict format, you can even remove the final ] from the pattern to improve efficiency. 如果您知道您的目标字符串将遵循您的严格格式,您甚至可以从模式中删除最终]以提高效率。 (10 steps) (10个步骤)

Compare this with lazy . 比较这与懒惰. matching. 匹配。 ~\\((.*?)\\)\\[(.*?)]~ (35 steps) and that's only on your little 16-character input string. ~\\((.*?)\\)\\[(.*?)]~ (35步)这只是你的小16字符输入字符串。 As your text increases in length (I can only imagine that you are targeting these substrings inside a much larger block of text) the performance impact will become greater. 随着文本长度的增加(我只能想象您将这些子字符串定位在更大的文本块中),性能影响会变得更大。

My point is, always try to design patterns that use "greedy" quantifiers in pursuit of making the best / most efficient pattern. 我的观点是,总是尝试设计使用“贪婪”量词的模式,以追求最佳/最有效的模式。 (further tips on improving efficiency: avoid piping ( | ), avoid capture groups, and avoid lookarounds whenever reasonable because they cost steps.) (关于提高效率的进一步提示:避免使用管道( | ),避免捕获组,并在合理的情况下避免使用外观,因为它们需要花费步骤。)

Code: ( Demo ) 代码:( 演示

$string='Demo #1: (11 steps)[1] and Demo #2: (35 steps)[2]';

var_export(preg_match_all('~\(([^)]*)\)\[([^\]]*)]~',$string,$out)?array_slice($out,1):[]);

Output: (I trimmed off the fullstring matches with array_slice() ) 输出:(我用array_slice()修剪了全字符串匹配)

array (
  0 => 
  array (
    0 => '11 steps',
    1 => '35 steps',
  ),
  1 => 
  array (
    0 => '1',
    1 => '2',
  ),
)

Or depending on your use: (with PREG_SET_ORDER ) 或者根据您的使用情况:(使用PREG_SET_ORDER

Code: ( Demo ) 代码:( 演示

$string='Demo #1: (11 steps)[1] and Demo #2: (35 steps)[2]';

var_export(preg_match_all('~\(([^)]*)\)\[([^\]]*)]~',$string,$out,PREG_SET_ORDER)?$out:[]);

Output: 输出:

array (
  0 => 
  array (
    0 => '(11 steps)[1]',
    1 => '11 steps',
    2 => '1',
  ),
  1 => 
  array (
    0 => '(35 steps)[2]',
    1 => '35 steps',
    2 => '2',
  ),
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM