简体   繁体   English

PCRE正则表达式重叠匹配

[英]PCRE regular expression overlapping matches

i have the following string我有以下字符串

001110000100001100001

and this expression而这个表达

/[1]....[1]/g

this makes two matches这使得两场比赛

火柴

but i want it to also match the pattern between those both with lookbehind so to say, the overlapping 1但我希望它也能匹配两者之间的模式,可以说,重叠的 1

i have absolutely no clue, how can this work ?我完全不知道,这怎么工作? instead of 0 it can be any characters而不是 0 它可以是任何字符

A common trick is to use capturing technique inside an unanchored positive lookahead.一个常见的技巧是在未锚定的正向前瞻中使用捕获技术。 Use this regex with preg_match_all :将此正则表达式与preg_match_all

(?=(1....1))

See regex demo正则表达式演示

The values are in $matches[1] :值在$matches[1]

$re = "/(?=(1....1))/"; 
$str = "001110000100001100001"; 
preg_match_all($re, $str, $matches);
print_r($matches[1]);

See lookahead reference :请参阅 前瞻参考

Lookaround actually matches characters, but then gives up the match, returning only the result: match or no match. Lookaround 实际上匹配字符,但随后放弃匹配,只返回结果:匹配或不匹配。 That is why they are called "assertions".这就是为什么它们被称为“断言”。 They do not consume characters in the string, but only assert whether a match is possible or not.它们不消耗字符串中的字符,而只断言匹配是否可能。

If you want to store the match of the regex inside a lookahead , you have to put capturing parentheses around the regex inside the lookahead , like this: (?=(regex)) .如果要将正则表达式的匹配存储在 lookahead 中则必须在 lookahead中的正则表达式周围放置捕获括号,如下所示: (?=(regex))

You can also do it using the \\K feature (that refers to where the returned result begins) inside a lookbehind:您还可以使用后视中的\\K功能(指的是返回结果的开始位置)来完成此操作:

(?<=\K1)....1

demo演示

This way, you don't need to create a capture group, and since all characters are consumed (except the first that is in the lookbehind), the regex engine doesn't have to retry the pattern for the next five positions after a success.这样,您不需要创建捕获组,并且由于所有字符都被消耗了(除了后视中的第一个字符),正则表达式引擎在成功后不必为接下来的五个位置重试模式.

$str = '001110000100001100001';

preg_match_all('~ (?<= \K 1 ) .... 1 ~x', $str, $matches);

print_r($matches[0]);

code代码

Note that if you are sure the second character is always a zero, using 0(?<=\\K10)...1 is more performant because the pattern starts with a literal character and pcre is able to optimize it with a quick search of possible positions in the subject string.请注意,如果您确定第二个字符始终为零,则使用0(?<=\\K10)...1的性能更高,因为该模式以文字字符开头,并且 pcre 能够通过快速搜索主题字符串中的可能位置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM