简体   繁体   English

如何使用(正则表达式)删除java中重复的字母并且不区分大小写

[英]How to remove repeated letters in java using (Regular Expressions) and being case Insensitive

I have been trying to do is to replace any repeated letters with the lower case version of their letter (in java). 我一直试图做的是用他们的小写字母(在java中)替换任何重复的字母。 For example: 例如:

I want a function that maps: 我想要一个映射的函数:

bob -> bob
bOb -> bob
bOOb -> bob
bOob -> bob
boOb -> bob
bob -> bob
Bob -> Bob
bOb -> bob

However, I have been not successful to do this using regexs (in Java). 但是,我没有成功使用正则表达式(在Java中)。

I have tried the following: 我尝试过以下方法:

    String regex = "([A-za-z])\\1+";
    String str ="bOob";
    Pattern pattern = Pattern.compile(regex , Pattern.CASE_INSENSITIVE);
    Matcher matcher = pattern.matcher(str);
    System.out.println(matcher.replaceAll("$1"));

However, this returns bOb and not bob. 但是,这会返回bOb而不是bob。 (it works on boOb). (它适用于boOb)。

I also tried: 我也尝试过:

        Pattern pattern = Pattern.compile("(?i)([A-Za-z0-9])(?=\\1)", Pattern.CASE_INSENSITIVE);
        Matcher matcher = pattern.matcher(str);
        return matcher.replaceAll("");

This solve one problem, now bOob -> bob but brings another problem because now it maps boOb to bob. 这解决了一个问题,现在bOob - > bob但是带来了另一个问题,因为现在它将boOb映射到bob。

NOTE: it should also map BOobOoboObOoObooOoOoOoOoOOb -> Bobobobobob. 注意:它还应该映射BOobOoboObOoObooOoOoOoOoOOb - > Bobobobobob。

I feel that at this point it might just be easier to loop over the string and do some logic based on each character but I just didn't want to give up using regexs... If there exists a solution using regexs, is it more likely to be more efficient than a loop going over each character? 我觉得在这一点上循环字符串并根据每个字符做一些逻辑可能更容易但我只是不想放弃使用正则表达式...如果存在使用正则表达式的解决方案,是否更多可能比循环遍历每个角色更有效?

Thanks in advance! 提前致谢!

PS: I am aware that one could just lower case everything before passing the string, though, thats not what I wanted because it maps: PS:我知道在传递字符串之前可以简单地降低案例,但这不是我想要的,因为它映射:

Bob -> bob 鲍勃 - >鲍勃

Use Matcher#group() instead of $1 here 在这里使用Matcher#group()而不是$1

if (matcher.find()) {
    System.out.println(matcher.replaceAll(matcher.group(1)
                                          .toLowerCase()));
}

Lets you make use of toLowerCase() then. 让您使用toLowerCase()然后。

EDIT : (in response to OP's comments) 编辑 :(回应OP的评论)

Matcher#group(n) is same as $n -- it refers to the n'th capture group. Matcher#group(n)$n相同 - 它指的是第n个捕获组。 So, group(1) and $1 both capture O except that you can switch the capture toLowerCase() . 因此, group(1)$1都捕获O除了您可以将捕获切换到toLowerCase()

The loop is being run by replaceAll() not by the find() . 循环由replaceAll()运行,而不是由find() Matcher#find() is required to initialize the groups, so that group(1) returns the capture before replaceAll() is invoked. Matcher#find()是初始化组所必需的,因此group(1)在调用replaceAll()之前返回捕获。

But, this also means the capture stays the same which suffices your requirements but would require the matcher to be reset for a string like BOobbOobboObbOoObbooOoOoOoOoOOb (notice the double b's). 但是,这也意味着捕获保持不变,满足您的要求,但需要重置匹配器,如BOobbOobboObbOoObbooOoOoOoOoOOb (注意双b)。 The loop would have to be driven by Mathcer#find() now which means replaceAll() gets traded with replaceFirst() . 循环必须由Mathcer#find()驱动,这意味着replaceAll()replaceFirst()进行交易。

String regex = "([A-Za-z])\\1+";
String str = "BOobbOobboObbOoObbooOoOoOoOoOObb";

Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);

while (matcher.find()) {
    str = matcher.replaceFirst(matcher.start() > 0 ? matcher.group(1)
                                    .toLowerCase() : matcher.group(1));
    matcher.reset(str);
}

System.out.println(str); // Bobobobobob

Matcher#start() is used here to identify if the match is at the start of input where case is left untouched. 此处使用匹配器#start()来识别匹配是否在输入的开始处,其中保持不变的情况。

I think this is the code I was looking for (based on the accepted answer): 我认为这是我正在寻找的代码(根据接受的答案):

public String removeRepeatedLetters(String str, boolean caseSensitive){
    if(caseSensitive){
        return this.removeRepeatedLetters(str); //uses case sensitive version
    }else{
        Pattern patternRep = Pattern.compile("([A-Za-z])(\\1+)", Pattern.CASE_INSENSITIVE);
        Matcher matcher = patternRep.matcher(str);
        String output = str;
        while(matcher.find()){
            String matchStr = matcher.group(1);
            output = matcher.replaceFirst(matchStr.toLowerCase());
            matcher = patternRep.matcher(output);
            matcher.reset();
        }
        return output;
    }   
}

What it does is replace any repeated letters (whether caps or not caps) and replaces them with a single non-caps one. 它的作用是替换任何重复的字母(无论是大写字母还是大写字母),并用一个非大写字母替换它们。

I think is very close to working as I want it to, though it maps Bbob -> bob. 我认为非常接近我想要的工作,虽然它映射了Bbob - > bob。 I doubt that because its not mapping to Bob, it would affect too much the reason I am using this. 我怀疑,因为它没有映射到Bob,它会影响我使用它的原因太多了。

btw, if anyone can see how to optimize this, feel free to comment! 顺便说一句,如果有人能看到如何优化这个,请随时评论! It does annoy me a little the .reset(), though I am not sure if its neccesary. 它确实让我烦恼了.reset(),虽然我不确定它是否是必要的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM