简体   繁体   English

正则表达式,带有preg_match_all的模式中的反向引用问题

[英]regex, problem with backreference in pattern with preg_match_all

i wonder what is the problem with the backreference here: 我想知道这里的反向引用有什么问题:

preg_match_all('/__\((\'|")([^\1]+)\1/', "__('match this') . 'not this'", $matches);

it is expected to match the string between __('') but actually it returns: 它应该匹配__('')之间的字符串,但实际上它返回:

match this') . 'not this

any ideas? 有任何想法吗?

You can't use a backreference inside a character class because a character class matches exactly one character, and a backreference can potentially match any number of characters, or none. 您不能在字符类中使用反向引用,因为字符类只匹配一个字符,反向引用可能匹配任意数量的字符,或者没有。

What you're trying to do requires a negative lookahead, not a negated character class: 你要做的是需要一个负面的预测,而不是一个否定的角色类:

preg_match_all('/__\(([\'"])(?:(?!\1).)+\1\)/',
    "__('match this') . 'not this'", $matches);

I also changed your alternation - \\'|" - to a character class - [\\'"] - because it's much more efficient, and I escaped the outer parentheses to make them match literal parentheses. 我也改变了你的交替 - \\'|" - 改为一个字符类 - [\\'"] - 因为它更有效率,我逃脱了外括号,使它们与字面括号匹配。


EDIT: I guess I need to expand that "more efficient" remark. 编辑:我想我需要扩展“更有效”的评论。 I took the example Friedl used to demonstrate this point and tested it in RegexBuddy. 我接过例如弗里德尔用来证明这一点,并在使用RegexBuddy测试它。

Applied to target text abababdedfg , 应用于目标文本abababdedfg
^[ag]+$ reports success after three steps, while ^[ag]+$在三个步骤后报告成功,而
^(?:a|b|c|d|e|f|g)+$ takes 55 steps. ^(?:a|b|c|d|e|f|g)+$需要55步。

And that's for a successful match. 这是一场成功的比赛。 When I try it on abababdedfz , 当我在abababdedfz上尝试时,
^[ag]+$ reports failure after 21 steps; ^[ag]+$报告21步后失败;
^(?:a|b|c|d|e|f|g)+$ takes 99 steps. ^(?:a|b|c|d|e|f|g)+$需要99步。

In this particular case the impact on performance is so trivial it's not even worth mentioning. 在这种特殊情况下,对性能的影响是微不足道的,甚至不值得一提。 I'm just saying whenever you find yourself choosing between a character class and an alternation that both match the same things, you should almost always go with the character class. 我只是说,无论何时你发现自己在一个字符类和一个两个都匹配相同的事物的选择之间做出选择,你几乎总是应该选择角色类。 Just a rule of thumb. 只是一个经验法则。

I'm suprised it didn't give you an unbalance parenthesis error message. 我很惊讶它没有给你一个不平衡的括号错误信息。

 /
   __
   (
       (\'|")
       ([^\1]+)
       \1
 /

This [^\\1] will not take the contents of capture buffer 1 and put it into a character [^\\1]不会获取捕获缓冲区1的内容并将其放入字符中
class. 类。 It is the same as all characters that are NOT '1'. 它与所有不是'1'的字符相同。

Try this: 尝试这个:

/__\\(('|").*?\\1\\).*/

You can add an inner capturing parenthesis to just capture whats between quotes: 您可以添加内部捕获括号,以捕获引号之间的内容:
/__\\(('|")(.*?)\\1\\).*/

Edit: If no inner delimeter is allowed, use Qtax regex. 编辑:如果不允许内部分隔符,请使用Qtax正则表达式。
Since, ('|").*?\\1 even though non-greedy, will still match all up to the trailing anchor. In this case __('all'this'will"match') , and its better to use ('[^']*'|"[^"]*) as 因为, ('|").*?\\1即使不贪婪,仍将匹配所有尾随锚。在这种情况下__('all'this'will"match') ,并且它更好用('[^']*'|"[^"]*) as

你可以使用类似的东西: /__\\(("[^"]+"|'[^']+')\\)/

让你的正则表达不合适:

preg_match_all('/__((\'|")([^\1]+)\1/U', "__('match this') . 'not this'", $matches)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM