简体   繁体   English

PHP PCRE正则表达式

[英]PHP PCRE regular expression

In LaTeX, the expression \\o{a}{b} means the operator 'o' takes two arguments a and b. 在LaTeX中,表达式\\o{a}{b}表示运算符'o'接受两个参数a和b。 LaTeX also accepts \\o{a} , and in this case treats the second argument as the empty string. LaTeX也接受\\o{a} ,在这种情况下,将第二个参数视为空字符串。

Now I try to match the regex \\\\\\\\o\\{([\\s\\S]*?)\\}\\{([\\s\\S]*?)\\} against the string \\o{a}\\o{a}{b} . 现在我尝试将正则表达式\\\\\\\\o\\{([\\s\\S]*?)\\}\\{([\\s\\S]*?)\\}与字符串\\o{a}\\o{a}{b}匹配\\o{a}\\o{a}{b} It mistakes the whole string to be a match when it isn't. 它错误地将整个字符串作为匹配而不是。 (The correct interpretation of this string is that the substring \\o{a}{b} is the only match.) The point is I need to know how to tell PHP to recognise that if there is something else than { following the first }, then it is not a match. (这个字符串的正确解释是子字符串\\o{a}{b}是唯一的匹配。)关键是我需要知道如何告诉PHP如果还有其他内容而不是{在第一个之后}那么这不是一场比赛。

How should I do that? 我该怎么办?

Edit : Arguments of an operator are allowed to contain the symbols \\ , { and } . 编辑 :允许运算符的参数包含符号\\{} But in this case the reason the whole string is not a match is because the curly brackets in a}\\o{a do not conform to LaTeX rules (eg { must come before } ), so that a}\\o{a cannot be an argument of an operator... 但在这种情况下,整个字符串不匹配的原因是因为a}\\o{a的大括号不符合LaTeX规则(例如{必须先于} ),所以a}\\o{a不能运营商的论点......

Edit2 : On the other hand, \\o{{a}}{b} should be a match as {a} is a valid argument. Edit2 :另一方面, \\o{{a}}{b}应匹配,因为{a}是有效参数。

I suggest something like this: 我建议像这样:

$s = '\\o{a}\\o{a}{b}';
echo "$s\n";  # Check string
preg_match('~\\\o(\{(?>[^{}\\\]++|(?1)|\\\.)+\}){2}~', $s, $match);
print_r($match);

ideone demo ideone演示

The regex: 正则表达式:

  • uses recursion to deal with nested braces, 使用递归来处理嵌套大括号,
  • uses backslashes too ( [^{}\\\\\\] and \\\\\\. ) to avoid taking literal braces for syntactical braces. 也使用反斜杠( [^{}\\\\\\]\\\\\\. )以避免使用语法大括号的文字括号。

\\\o             # Matches \o
(                # Recursive group to be
  \{             # Matches {
  (?>            # Begin atomic group (just a group that makes the regex faster)
     [^{}\\\]++  # Any characteres except braces and backslash
  |
     (?1)        # Or recurse the outer group
  |
     \\\.        # Or match an escaped character
  )+             # As many times as necessary
  \}             # Closing brace
){2}             # Repeat twice

The problem with your current regex is that once this part matched \\\\\\\\o\\{([\\s\\S]*?) , it will try to look for the next \\} that is coming, and there, it matters not whether you are using a lazy quantifier or a greedy one. 你当前的正则表达式的问题是,一旦这个部分匹配\\\\\\\\o\\{([\\s\\S]*?) ,它将尝试寻找即将到来的下一个\\} ,那里重要的是无论你是使用懒惰量词还是贪婪量词。 You need to somehow prevent it to match } before the actual \\} comes in the regex. 在真正的\\}进入正则表达式之前,你需要以某种方式阻止它匹配}

That's why you have to use [^{}] and since you actually can have nested braces inside, that's the ideal situation to use recursion. 这就是你必须使用[^{}] ,因为你实际上可以在里面嵌套括号,这是使用递归的理想情况。

to deal with possible nested curly brackets you need to use the recursion feature: 要处理可能的嵌套花括号,您需要使用递归功能:

$pattern = <<<'EOD'
~
\\o({(?>[^{}]+|(?-1))*}){2}
~x
EOD;

where (?-1) is a reference to the subpattern of the last capturing group. 其中(?-1)是对最后一个捕获组的子模式的引用。

I would guess you need to look into using anchors ^ and $ 我猜你需要研究使用锚点^$

$pattern = '/^\\o\{.*\}(\{.*\})?$/';

I don't know what you consider aceptable values for a and b , so you can replace .* with an appropriate class here. 我不知道你认为ab aceptable值是什么,所以你可以在这里用适当的类替换.*

This allows either \\0{a} or \\o{a}{b} formats. 这允许\\0{a}\\o{a}{b}格式。 To match only \\o{a}{b} modify to this: 仅匹配\\o{a}{b}修改为:

$pattern = '/^\\o\{.*\}\{.*\}$/';

Based on your last edit, I would suggest replacing .* in above with [^{]* as noted in other answers. 根据您的上一次编辑,我建议将上面的.*替换为[^{]*如其他答案中所述。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM