简体   繁体   中英

regex, problem with backreference in pattern with preg_match_all

i wonder what is the problem with the backreference here:

preg_match_all('/__\((\'|")([^\1]+)\1/', "__('match this') . 'not this'", $matches);

it is expected to match the string between __('') but actually it returns:

match this') . 'not this

any ideas?

You can't use a backreference inside a character class because a character class matches exactly one character, and a backreference can potentially match any number of characters, or none.

What you're trying to do requires a negative lookahead, not a negated character class:

preg_match_all('/__\(([\'"])(?:(?!\1).)+\1\)/',
    "__('match this') . 'not this'", $matches);

I also changed your alternation - \\'|" - to a character class - [\\'"] - because it's much more efficient, and I escaped the outer parentheses to make them match literal parentheses.


EDIT: I guess I need to expand that "more efficient" remark. I took the example Friedl used to demonstrate this point and tested it in RegexBuddy.

Applied to target text abababdedfg ,
^[ag]+$ reports success after three steps, while
^(?:a|b|c|d|e|f|g)+$ takes 55 steps.

And that's for a successful match. When I try it on abababdedfz ,
^[ag]+$ reports failure after 21 steps;
^(?:a|b|c|d|e|f|g)+$ takes 99 steps.

In this particular case the impact on performance is so trivial it's not even worth mentioning. I'm just saying whenever you find yourself choosing between a character class and an alternation that both match the same things, you should almost always go with the character class. Just a rule of thumb.

I'm suprised it didn't give you an unbalance parenthesis error message.

 /
   __
   (
       (\'|")
       ([^\1]+)
       \1
 /

This [^\\1] will not take the contents of capture buffer 1 and put it into a character
class. It is the same as all characters that are NOT '1'.

Try this:

/__\\(('|").*?\\1\\).*/

You can add an inner capturing parenthesis to just capture whats between quotes:
/__\\(('|")(.*?)\\1\\).*/

Edit: If no inner delimeter is allowed, use Qtax regex.
Since, ('|").*?\\1 even though non-greedy, will still match all up to the trailing anchor. In this case __('all'this'will"match') , and its better to use ('[^']*'|"[^"]*) as

你可以使用类似的东西: /__\\(("[^"]+"|'[^']+')\\)/

让你的正则表达不合适:

preg_match_all('/__((\'|")([^\1]+)\1/U', "__('match this') . 'not this'", $matches)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM