[英]How to use PHP regular expressions to search a string for word sequences containing repeated words?
I am using PHP to count the number of occurrences of a word sequence in a string. 我使用PHP来计算字符串中单词序列的出现次数。 In the following example cases, I am not getting the result I would like to see.
在以下示例中,我没有得到我希望看到的结果。
$subject1 = " [word1 [word1 [word1 [word1 [word3 ";
$pattern1 = preg_quote("[word1 [word1", '/');
echo "count of '[word1 [word1'=". preg_match_all("/(\s|^|\W)" . $pattern1 . "(?=\s|$|\W)/", $subject1, $dummy) . "<br/>";
$subject2 = " [word1 [word2 [word1 [word2 [word1 [helloagain ";
$pattern2 = preg_quote("[word1 [word2 [word1", '/');
echo "count of '[word1 [word2 [word1'=". preg_match_all("/(\s|^|\W)" . $pattern2 . "(?=\s|$|\W)/", $subject2, $dummy) . "<br/>";
the above returns: 以上回报:
count of '[word1 [word1'=2
count of '[word1 [word2 [word1'=1
I would like the result to be: 我希望结果如下:
count of '[word1 [word1'=3 // there are 3 instances of ‘[word1 [word1’ in $subject1
count of '[word1 [word2 [word1'=2 // // there are 2 instances of [word1 [word2 [word1’ in $subject2
One way to achieve this is each time the pattern is found in subject the next search should start from the second word in the matched substring. 实现此目的的一种方法是每次在主题中找到模式时,下一个搜索应该从匹配子字符串中的第二个单词开始。 Can such a regular expression be constructed?
可以构建这样的正则表达式吗? Thank you.
谢谢。
Use mb_substr_count 使用mb_substr_count
substr_count
does not count overlapped values, but i dont know why, mb_substr_count
does substr_count
不计算重叠值,但我不知道为什么, mb_substr_count
确实如此
$subject1 = " [word1 [word1 [word1 [word1 [word3 ";
echo mb_substr_count($subject1, "[word1 [word1"); // 3
echo mb_substr_count($subject1, "[word1 [word1 [word1"); // 2
EDIT: 编辑:
For future reference, 备查,
Apparently mb_substr_count
acts differently on php 5.2 than php 5.3 . 显然
mb_substr_count
在php 5.2上的行为与php 5.3不同。 I suppose the right behavior of this function should be same as substr_count
, only for multibyte support, and since substr_count
doesn't support overlapping, so should mb_substr_count
. 我想这个函数的正确行为应该与
substr_count
相同,仅用于多字节支持,并且由于substr_count
不支持重叠,所以substr_count
也应该mb_substr_count
。
So, though this answer works on php 5.2.6, do not use it, or you may have problems when you update your php version. 所以,虽然这个答案适用于php 5.2.6,但是不要使用它,否则当你更新php版本时可能会遇到问题。
Instead of preg_match_all, I'd use a while loop on preg_match with offset: 而不是preg_match_all,我在preg_match上使用带有offset的while循环:
$subject1 = " [word1 [word1 [word1 [word1 [word3 ";
$pattern1 = preg_quote("[word1 [word1", '/');
$offset=0;
$total=0;
while($count = preg_match("/(?:\s|^|\W)$pattern1(?=\s|$|\W)/", $subject1, $matches, PREG_OFFSET_CAPTURE, $offset)) {
// summ all matches
$total += $count;
// valorisation of offset with the position of the match + 1
// the next preg_match will start at this position
$offset = $matches[0][1]+1;
}
echo "total=$total\n";
output: 输出:
total=3
The result for the second example is : total=2
第二个例子的结果是:
total=2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.