简体   繁体   English

PHP 正则表达式中的这种反向引用条件如何工作?

[英]How does this backreference condition in a PHP regex work?

My requirement is as follows:我的要求如下:

  1. If a string contains the word "cat", "cat" must be matched.如果字符串包含单词“cat”,则必须匹配“cat”。
  2. However, if the word "cat" is preceded by the word "dog", then "cat" must also be succeeded by "dog" and all 3 words must be matched.但是,如果“cat”这个词前面有“dog”这个词,那么“cat”后面也必须有“dog”,并且三个词都必须匹配。
  3. This means that the string "dog cat" must not be matched, since the 2nd "dog" isn't present.这意味着不能匹配字符串“dog cat”,因为第二个“dog”不存在。

Accordingly, I have written the following regex in PHP.因此,我在 PHP 中编写了以下正则表达式。 It contains a backreference condition:它包含一个反向引用条件:

      $ptn = '@' .                    // PHP delimiter
             '(dog\s*)?' .            // dog
             'cat\s*' .               // cat
             '(?(1)dog)' .            // backreference cond
             '@';                     // PHP delimiter

The regex meets requirement 1:正则表达式满足要求 1:

     $str1b = 'cat';
     preg_match($ptn, $str1b, $matches);
     print_r($matches);

The O/P is: O/P 为:

Array ( [0] => cat )数组( [0] => 猫)

The regex also meets requirement 2:正则表达式还满足要求 2:

     $str1a = 'dog cat dog';
     preg_match($ptn, $str1a, $matches);
     print_r($matches);

The O/P is: O/P 为:

Array ( [0] => dog cat dog [1] => dog )数组( [0] => 狗猫狗 [1] => 狗)

However, I'd like to ask why the array contains 2 elements?但是,我想问一下为什么数组包含 2 个元素? Is it because the regex has 2 consuming sub-expressions?是因为正则表达式有 2 个消耗子表达式吗?

Now about requirement 3. The following data tests it:现在关于要求 3。以下数据对其进行测试:

      $str1c = 'dog cat';
      preg_match($ptn, $str1c, $matches);
      print_r($matches);

The O/P here is:这里的 O/P 是:

Array ( [0] => cat )数组( [0] => 猫)

Here, I'd like to ask:在这里,我想问一下:

  1. Why is "cat" matched?为什么要匹配“猫”? Since it is preceded by "dog", it should have been succeeded by "dog" too, which would have resulted in a match;既然前面有“dog”,那么后面也应该有“dog”,这样就可以匹配了; else no match should have occurred.否则不应该发生匹配。

  2. Is this how the regex is supposed to work?这是正则表达式应该如何工作的吗?

  3. How can I achieve my 3 requirements?我怎样才能达到我的3个要求?

Here's the code .这是代码

I am considering a solution only in PHP.我正在考虑仅在 PHP 中的解决方案。

However, if the word "cat" is preceded by the word "dog", then "cat" must also be succeeded by "dog" and all 3 words must be matched但是,如果“cat”这个词前面有“dog”这个词,那么“cat”后面也必须有“dog”,并且三个词都必须匹配

TheFourthBird has a good answer using PCRE TheFourthBird 使用 PCRE 有一个很好的答案

Based on my interpretation this regex may also work for you:根据我的解释,这个正则表达式也可能对你有用:

\b(?:dog cat dog|(?<!\bdog )cat)\b

RegEx Details:正则表达式详细信息:

  • \b : Word boundary \b :单词边界
  • (?: : Start non-capture group (?: : 启动非捕获组
    • dog cat dog : Match dog cat dog dog cat dog :匹配dog cat dog
    • | :
    • (?<!\bdog )cat : Match cat if not preceded by word dog and a space (?<!\bdog )cat : 如果前面没有单词dog和空格,则匹配cat
  • ) : End non-capture group ) : 结束非捕获组
  • \b : Word boundary \b :单词边界

You get the 2 matches as you are using a capture group.当您使用捕获组时,您将获得 2 个匹配项。

With the pattern that you tried (dog\s*)?cat\s*(?(1)dog) you get a match for cat in dog cat使用您尝试过的模式(dog\s*)?cat\s*(?(1)dog)您会得到cat in dog cat

This is because the pattern optionally matches dog.这是因为该模式可以选择匹配 dog。 If there is dog, it is captured and then tries to match cat.如果有狗,它会被捕获,然后尝试匹配猫。

Then in the if clause is states: if we have group 1 present, match dog.然后在 if 子句中声明:如果我们有第 1 组,则匹配 dog。 What happens is that if there is no match in group 1 , it can still match cat as the capture group 1 is optional.发生的情况是,如果组 1中没有匹配项,它仍然可以匹配 cat,因为捕获组 1 是可选的。

So in dog cat it eventually can not match dog, but the following cat it can match when the attempt starts at cat.所以在dog cat中它最终无法匹配 dog,但是当尝试从 cat 开始时,它可以匹配下一个 cat。


If you want to match all 3 words dog cat dog or only a single cat and you don't want to match dog cat you might use如果您想匹配所有 3 个单词dog cat dog或只匹配一只cat ,并且您不想匹配dog cat ,您可以使用

\b(?:dog cat dog|dog cat\b(*SKIP)(*F)|cat)\b
  • \b A word boundary to prevent a partial match \b防止部分匹配的单词边界
  • (?: Non capture group (?:非捕获组
    • dog cat dog Match literally dog cat dog匹配字面意思
    • | Or或者
    • dog cat\b(*SKIP)(*F) In case of dog cat skip the match dog cat\b(*SKIP)(*F)如果是dog cat则跳过比赛
    • | Or或者
    • cat Math only cat cat数学只猫
  • ) Close non capture group )关闭非捕获组
  • \b A word boundary \b一个词的边界

Regex demo |正则表达式演示| Php demo Php 演示

For example例如

$strings = [
    "cat",
    "dog cat dog",
    "dog cat",
    "cat dog",
    "this cat cat is a test dog cat dog cat"
];
$pattern = "/\b(?:dog cat dog|dog cat\b(*SKIP)(*F)|cat)\b/";
foreach ($strings as $str) {
    preg_match_all($pattern, $str, $matches);
    print_r($matches[0]);
}

Output Output

Array
(
    [0] => cat
)
Array
(
    [0] => dog cat dog
)
Array
(
)
Array
(
    [0] => cat
)
Array
(
    [0] => cat
    [1] => cat
    [2] => dog cat dog
    [3] => cat
)

An alternative approach using a capture group could be matching what you want to avoid, and capture what you want to keep.使用捕获组的另一种方法可能是匹配您想要避免的内容,并捕获您想要保留的内容。 For matching spaces, you could use \s but note that it could also match a newline.对于匹配空格,您可以使用\s但请注意它也可以匹配换行符。

\bdog cat\b(?! dog\b)|\b(dog cat dog|cat)\b

Regex demo正则表达式演示

If a quantifier is available in a lookbehind assertion, you might also use如果量词在后向断言中可用,您也可以使用

\bdog cat dog\b|(?<!dog *)\bcat\b|cat(?= *dog\b)

Regex demo正则表达式演示

Here's a solution that works with standard syntax .这是一个使用标准语法的解决方案

It matches "dog cat dog" and "cat" but NOT "dog cat".它匹配“dog cat dog”和“cat”,但不匹配“dog cat”。

I am unable to post the regex here, since SO claims it isn't indented properly (though it is).我无法在此处发布正则表达式,因为 SO 声称它没有正确缩进(尽管它是)。 Please check the link for the regex.请检查正则表达式的链接。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM