简体   繁体   English

PHP Regexp捕获重复的字符组,例如hahaha jajajaja hihihi

[英]PHP Regexp capturing repeating group of chars, e.g. hahaha jajajaja hihihi

As title, is there a way in PHP, with preg_match_all to catch all the repetitions of chars group? 作为标题,PHP中有没有办法用preg_match_all来捕获chars组的所有重复项? For instante catch 瞬间捕获

  1. hahahaha 哈哈哈哈
  2. jajajaj 贾贾杰
  3. hihihi 嗨嗨嗨

It's fine to catch repetition of any char, like abababab, acacacacac. 捕捉任何字符(例如abababab,acacacacac)的重复都是很好的。 Also, is there a way to count the number of repetition? 另外,有没有一种方法可以计算重复次数?

The idea is to catch all this "forms" of smiling on social media. 这个想法是在社交媒体上捕捉所有这些微笑的“形式”。 I figured out that there are also other cases, such as misspelled instances like ahahhahaah (where you have two consecutive a or h). 我发现还有其他情况,例如ahahhahaah(您有两个连续的a或h)之类的拼写错误的实例。 Any ideas? 有任何想法吗?

How about this: 这个怎么样:

preg_match_all('/((?i)[a-z])((?i)[a-z])(\1\2)+/', $str, $m);
$matches = $m[0]; //$matches will contain an array of matches

A bit complicated, but it does work. 有点复杂,但确实可以。 To explain, the first subpattern ( ((?i)[az]) ) matches any character between a and z, no matter the case. 为了解释,第一个子模式( ((?i)[az]) )匹配a和z之间的任何字符,无论大小写如何。 The second subpattern ( ((?i)[az]) ) does the same thing. 第二个子模式( ((?i)[az]) )做同样的事情。 The third subpattern ( (\\1\\2)+ ) matches one or more repetitions of the first two letters, in the same case as they were originally put. 第三个子模式( (\\1\\2)+ )与前两个字母的一个或多个重复匹配,与最初放置它们的情况相同。 This regular expression also assumes that there's an even number of repetitions. 此正则表达式还假设重复次数为偶数。 If you don't want that, you can add \\1? 如果您不想这样做,可以添加\\1? at the end, meaning that (as long as it contains one or more repetitions), it can end with the first character (for instance, hahah and ikikikik would both be valid, but not asa ). 最后,表示(只要它包含一个或多个重复),它可以以第一个字符结尾(例如, hahahikikikik都有效,但不是asa )。

To retrieve the number of repetitions for a specific match, you can do: 要检索特定匹配项的重复次数,您可以执行以下操作:

$numb = strlen($matches[$index])/2 - 1; //-1 because the first two letters aren't repetitions

For the shortest repetition (eg ha gets repeated multiple times in hahahaha ): 对于最短的重复(例如hahahahaha被重复多次):

(.+?)\1+

See demo . 参见演示

For the longest repetition (eg haha gets repeated in hahahaha ): 对于最长的重复(例如, hahahahahaha重复):

(.+)\1+

Counting Repetitions 计数重复

The non-regex solution is to compare the lengths of Group 1 (the repteated token) and the overall match. 非正则表达式的解决方案是比较第1组(带信誉的令牌)的长度和整体匹配。

With pure regex, in .NET, you could simply do (.+?)(\\1)+ and look at the number of captures in the Group 1 CaptureCollection object. 使用纯正则表达式,在.NET中,您可以简单地执行(.+?)(\\1)+并查看第1组CaptureCollection对象中的捕获数量。

In PHP, that's not possible, but there are some hacks. 在PHP中,这是不可能的,但是有一些技巧。 See, for instance, this question about matching a line number —it's the same technique. 例如,参见有关匹配行号的问题 -这是相同的技术。 This is for "study purposes" only—you wouldn't want to use that in real life. 这仅出于“研究目的”,您不希望在现实生活中使用它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM