简体   繁体   English

PHP Regex检测单词中的重复字符

[英]PHP Regex detect repeated character in a word

(preg_match('/(.)\1{3}/', $repeater))

I am trying to create a regular expression which will detect a word that repeats a character 3 or more times throughout the word. 我正在尝试创建一个正则表达式,该表达式将检测在整个单词中重复一个字符3次或更多次的单词。 I have tried this numerous ways and I can't seem to get the correct output. 我已经尝试了很多方法,但似乎无法获得正确的输出。

If you don't need letters to be contiguous, you can do it with this pattern: 如果不需要字母连续,则可以使用以下模式:

\b\w*?(\w)\w*?\1\w*?\1\w*

otherwise this one should suffice: 否则,这个就足够了:

\b\w*?(\w)\1{2}\w*

Try this regex instead 试试这个正则表达式

(preg_match('/(.)\1{2,}/', $repeater))

This should match 3 or more times, see example here http://regexr.com/3fk80 这应该匹配3次或更多次,请参见此处的示例http://regexr.com/3fk80

Strictly speaking, regular expressions that include \\1 , \\2 , ... things are not mathematical regular expressions and the scanner that parses them is not efficient in the sense that it has to modify itself to include the accepted group, in order to be used to match the discovered string, and in case of failure it has to backtrack for the length of the matched group. 严格来说,包含\\1\\2 ...的正则表达式不是数学正则表达式,而解析它们的扫描程序在必须修改自身以包含接受的组的意义上效率不高。用于匹配发现的字符串,并且在失败的情况下,必须回溯匹配组的长度。

The canonical way to express a true regular expression that accepts word characters repeated three or more times is 表示接受三个或三个以上重复的单词字符的正则表达式的规范方法是

(A{3,}|B{3,}|C{3,}|...|Z{3,}|a{3,}|b{3,}|...|z{3,})

and there's no associativity of the operator {3,} to be able to group it as you shown in your question. 并且运算符{3,}没有关联性{3,}无法按照您在问题中显示的进行分组。

For the pedantic, the pure regular expression should be: 对于pedantic,纯正则表达式应为:

(AAAA*|BBBB*|CCCC*|...|ZZZZ*|aaaa*|bbbb*|cccc*|...|zzzz*)

again, this time, you can use the fact that AAAA* is matched as soon as three A s are found, so it would be valid also the regex: 再次,这一次,您可以使用以下事实:一旦找到三个AAAAA*被匹配,因此它对正则表达式也有效:

AAA|BBB|CCC|...|ZZZ|aaa|bbb|ccc|...|zzz

but the first version allow you to capture the \\1 group that delimits the actual matching sequence. 但是第一个版本允许您捕获\\1组,该组界定了实际的匹配顺序。

This approach will be longer to write but is by far much more efficient when parsing the data string, as it has no backtrack at all and visits each character only once. 这种方法的编写时间更长,但在解析数据字符串时效率更高,因为它完全没有回溯,并且每个字符只能访问一次。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM