简体   繁体   中英

Regex for parsing text between brackets and parenthesis

I want to create a regex that saves all of $text1 and $text2 in two separade arrays. text1 and text2 are: ($text1)[$text2] that exist in string.

I wrote this code to parse between brackets as:

<?php

preg_match_all("/\[[^\]]*\]/", $text, $matches);

?>

It works correctly .

And I wrote another code to parse between parantheses as:

<?php

preg_match('/\([^\)]*\)/', $text, $match);

?>

But it just parses between one of parantheses not all of the parantheses in string :(

So I have two problems:

1) How can I parse text between all of the parantheses in the string?

2) How can I reach $text1 and $text2 as i described at top?

Please help me. I am confused about regex. If you have a good resource share it link. Thanks ;)

Use preg_match_all() with the following regular expression:

/(\[.+?\])(\(.+?\))/i

Demo

Details

/                   # begin pattern
    (               # first group, brackets
        \[          # literal bracket
            .+?     # any character, one or more times, greedily
        \]          # literal bracket, close
    )               # first group, close
    (               # second group, parentheses
        \(          # literal parentheses
            .+?     # any character, one or more times, greedily
        \)          # literal parentheses, close
    )               # second group, close
/i                  # end pattern

Which will save everything between brackets in one array, and everything between parentheses in another. So, in PHP:

<?php
$s = "[test1](test2) testing the regex [test3](test4)";
preg_match_all("/(\[.+?\])(\(.+?\))/i", $s, $m);
var_dump($m[1]); // bracket group
var_dump($m[2]); // parentheses group

Demo

The only reason you were failing to capture multiple ( ) wrapped substrings is because you were calling preg_match() instead of preg_match_all() .

A couple of small points:

  1. The ) inside of your negated character class didn't need to be escaped.
  2. The closing square bracket (at the end of your pattern) doesn't need to be escaped; regex will not mistake it to mean the end of a character class.
  3. There is no need to declare the i pattern modifier, you have no letters in your pattern to modify.

Combine your two patterns into one and bake in my small points and you have a fully refined/optimized pattern.

In case you don't know why your patterns are great, I'll explain. You see, when you ask the regex engine to match "greedily", it can move more efficiently (take less steps).

By using a negated character class, you can employ greedy matching. If you only use . then you have to use "lazy" matching ( *? ) to ensure that matching doesn't "go too far".

Pattern: ~\\(([^)]*)\\)\\[([^\\]]*)]~ (11 steps)

The above will capture zero or more characters between the parentheses as Capture Group #1, and zero or more characters between the square brackets as Capture Group #2.

If you KNOW that your target strings will obey your strict format, you can even remove the final ] from the pattern to improve efficiency. (10 steps)

Compare this with lazy . matching. ~\\((.*?)\\)\\[(.*?)]~ (35 steps) and that's only on your little 16-character input string. As your text increases in length (I can only imagine that you are targeting these substrings inside a much larger block of text) the performance impact will become greater.

My point is, always try to design patterns that use "greedy" quantifiers in pursuit of making the best / most efficient pattern. (further tips on improving efficiency: avoid piping ( | ), avoid capture groups, and avoid lookarounds whenever reasonable because they cost steps.)

Code: ( Demo )

$string='Demo #1: (11 steps)[1] and Demo #2: (35 steps)[2]';

var_export(preg_match_all('~\(([^)]*)\)\[([^\]]*)]~',$string,$out)?array_slice($out,1):[]);

Output: (I trimmed off the fullstring matches with array_slice() )

array (
  0 => 
  array (
    0 => '11 steps',
    1 => '35 steps',
  ),
  1 => 
  array (
    0 => '1',
    1 => '2',
  ),
)

Or depending on your use: (with PREG_SET_ORDER )

Code: ( Demo )

$string='Demo #1: (11 steps)[1] and Demo #2: (35 steps)[2]';

var_export(preg_match_all('~\(([^)]*)\)\[([^\]]*)]~',$string,$out,PREG_SET_ORDER)?$out:[]);

Output:

array (
  0 => 
  array (
    0 => '(11 steps)[1]',
    1 => '11 steps',
    2 => '1',
  ),
  1 => 
  array (
    0 => '(35 steps)[2]',
    1 => '35 steps',
    2 => '2',
  ),
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM