简体   繁体   English

正则表达式匹配/分组字符串中的重复字符

[英]Regex to match/group repeating characters in a string

I need a regular expression that will match groups of characters in a string. 我需要一个正则表达式来匹配字符串中的字符组。 Here's an example string: 这是一个示例字符串:

qwwwwwwwwweeeeerrtyyyyyqqqqwEErTTT

It should match 它应该匹配

(match group) "result" (匹配组)“结果”

(1) "q" (1)“q”

(2) "wwwwwwwww" (2)“wwwwwwwww”

(3) "eeeee" (3)“eeeee”

(4) "rr" (4)“rr”

(5) "t" (5)“t”

(6) "yyyyy" (6)“ yyyyy”

(7) "qqqq" (7)“qqqq”

(8) "w" (8)“w”

(9) "EE" (9)“EE”

(10) "r" (10)“r”

(11) "TTT" (11)“TTT”

after doing some research, this is the best I could come up with 经过一些研究,这是我能想到的最好的

/(.)(\\1*)/g

The problem I'm having is that the only way to use the \\1 back-reference is to capture the character first. 我遇到的问题是使用\\1反向引用的唯一方法是首先捕获字符。 If I could reference the result of a non capturing group I could solve this problem but after researching I don't think it's possible. 如果我可以参考非捕获组的结果,我可以解决这个问题但是在研究之后我不认为这是可能的。

How about /((.)(\\2*))/g (untested)? /((.)(\\2*))/g ((.)( /((.)(\\2*))/g ))/ /((.)(\\2*))/g (未经测试)怎么样? That way, you match the group as a whole (I'm assuming that that's what you want, and that's what's lacking from the solution you found). 这样一来,你就可以将整个团队匹配起来(我假设那就是你想要的东西,那就是你找到的解决方案所缺乏的东西)。

Looks like you need to use a Matcher in a loop: 看起来你需要在循环中使用Matcher:

Pattern p = Pattern.compile("((.)\\2*)");
Matcher m = p.matcher("qwwwwwwwwweeeeerrtyyyyyqqqqwEErTTT");
while (m.find()) {
    System.out.println(m.group(1));
}

Outputs: 输出:

q
wwwwwwwww
eeeee
rr
t
yyyyy
qqqq
w
EE
r
TTT

Assuming what @cruncher said as a premise is true: "we want to catch repeating letter groups without knowing beforehand which letter should be repeating" then: 假设@cruncher作为前提说的是正确的:“我们想在不事先知道应该重复哪个字母的情况下捕获重复的字母组”,那么:

/((a*?+)|(b*?+)|(c*?+)|(d*?+)|(e*?+)|(f*?+)|(g*?+)|(h*?+))/

The above RegEx should allow the capture of repeating letter groups without hardcoding a particular order in which they would occur. 上面的RegEx应该允许捕获重复的字母组,而不用硬编码它们出现的特定顺序。

The ?+ is a reluctant possesive quantifier which helps us not waste RAM space by not saving previously valid backtracking cases if the current case is valid. ?+是一个不情愿的积极量词,如果当前案例有效,它不会通过不保存以前有效的回溯案例来帮助我们不浪费RAM空间。

Since you did tag java, I'll give an alternative non-regex solution(I believe in requirements being the end product, not the method by which you get there). 既然你标记了java,我会给出一个替代的非正则表达式解决方案(我相信需求是最终产品,而不是你到达那里的方法)。

String repeat = "";
char c = '';
for(int i = 0 ; i < s.length() ; i++) {
    if(s.charAt(i) == c) {
        repeat += c;
    } else {
        if(!repeat.isEmpty()) 
            doSomething(repeat); //add to an array if you want
        c = s.charAt(i);
        repeat = "" + c;
    }
}
if(!repeat.isEmpty())
    doSomething(repeat);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM