PHP Regexp捕获重复的字符组，例如hahaha jajajaja hihihi

Question

As title, is there a way in PHP, with preg_match_all to catch all the repetitions of chars group? 作为标题，PHP中有没有办法用preg_match_all来捕获chars组的所有重复项？ For instante catch 瞬间捕获

hahahaha 哈哈哈哈

jajajaj 贾贾杰

hihihi 嗨嗨嗨

It's fine to catch repetition of any char, like abababab, acacacacac. 捕捉任何字符（例如abababab，acacacacac）的重复都是很好的。 Also, is there a way to count the number of repetition? 另外，有没有一种方法可以计算重复次数？

The idea is to catch all this "forms" of smiling on social media. 这个想法是在社交媒体上捕捉所有这些微笑的“形式”。 I figured out that there are also other cases, such as misspelled instances like ahahhahaah (where you have two consecutive a or h). 我发现还有其他情况，例如ahahhahaah（您有两个连续的a或h）之类的拼写错误的实例。 Any ideas? 有任何想法吗？

Answer 1

How about this: 这个怎么样：

preg_match_all('/((?i)[a-z])((?i)[a-z])(\1\2)+/', $str, $m);
$matches = $m[0]; //$matches will contain an array of matches

A bit complicated, but it does work. 有点复杂，但确实可以。 To explain, the first subpattern ( ((?i)[az]) ) matches any character between a and z, no matter the case. 为了解释，第一个子模式（ ((?i)[az]) ）匹配a和z之间的任何字符，无论大小写如何。 The second subpattern ( ((?i)[az]) ) does the same thing. 第二个子模式（ ((?i)[az]) ）做同样的事情。 The third subpattern ( (\\1\\2)+ ) matches one or more repetitions of the first two letters, in the same case as they were originally put. 第三个子模式（ (\\1\\2)+ ）与前两个字母的一个或多个重复匹配，与最初放置它们的情况相同。 This regular expression also assumes that there's an even number of repetitions. 此正则表达式还假设重复次数为偶数。 If you don't want that, you can add \\1? 如果您不想这样做，可以添加\\1? at the end, meaning that (as long as it contains one or more repetitions), it can end with the first character (for instance, hahah and ikikikik would both be valid, but not asa ). 最后，表示（只要它包含一个或多个重复），它可以以第一个字符结尾（例如， hahah和ikikikik都有效，但不是asa ）。

To retrieve the number of repetitions for a specific match, you can do: 要检索特定匹配项的重复次数，您可以执行以下操作：

$numb = strlen($matches[$index])/2 - 1; //-1 because the first two letters aren't repetitions

Answer 2

For the shortest repetition (eg ha gets repeated multiple times in hahahaha ): 对于最短的重复（例如ha在hahahaha被重复多次）：

(.+?)\1+

See demo . 参见演示。

For the longest repetition (eg haha gets repeated in hahahaha ): 对于最长的重复（例如， haha在hahahaha重复）：

(.+)\1+

Counting Repetitions 计数重复

The non-regex solution is to compare the lengths of Group 1 (the repteated token) and the overall match. 非正则表达式的解决方案是比较第1组（带信誉的令牌）的长度和整体匹配。

With pure regex, in .NET, you could simply do (.+?)(\\1)+ and look at the number of captures in the Group 1 CaptureCollection object. 使用纯正则表达式，在.NET中，您可以简单地执行(.+?)(\\1)+并查看第1组CaptureCollection对象中的捕获数量。

In PHP, that's not possible, but there are some hacks. 在PHP中，这是不可能的，但是有一些技巧。 See, for instance, this question about matching a line number —it's the same technique. 例如，参见有关匹配行号的问题 -这是相同的技术。 This is for "study purposes" only—you wouldn't want to use that in real life. 这仅出于“研究目的”，您不希望在现实生活中使用它。

PHP Regexp捕获重复的字符组，例如hahaha jajajaja hihihi

问题描述

2 个解决方案

解决方案1
2 2014-07-15 00:12:18

解决方案2
1 2014-07-15 01:08:58

PHP Regexp捕获重复的字符组，例如hahaha jajajaja hihihi

问题描述

2 个解决方案

解决方案1 2 2014-07-15 00:12:18

解决方案2 1 2014-07-15 01:08:58

解决方案1
2 2014-07-15 00:12:18

解决方案2
1 2014-07-15 01:08:58