简体   繁体   中英

Can't get rid of extra spaces in java

I have extra spaces for example "- - - -" That I'm trying to remove... I tried using regex "\\s+" as well as writing my own function.

System.out.println(test.removeExtraSpaces("-   -   -  "));
System.out.println(test.removeExtraSpaces("-   -   -  "));

and my results are

- - -
-   -   -  

The first one I physically typed out the "spaces" with 3 of them in between each dash and the second one is from an import file. I think the problem I'm having is that they're not "real" spaces or a space with different unicode or something but I don't know how to remove them.

I started off using regex but that didn't work and I tried this which results in the image

public String removeExtraSpaces(String s){
    s.trim();
    String newString = "";

    for(int i = 0; i < s.length() - 1; i++){
        if(s.charAt(i) != ' '){
            newString = newString + s.charAt(i);
        }
        else{
            if(s.charAt(i + 1) != ' '){
                newString = newString + s.charAt(i);
            }
        }
    }
    newString = newString + s.charAt(s.length()-1);

    return newString.trim();
}

Here is the result http://i.imgur.com/WPAF8TB.png

EDIT: People have been suggesting regex which I've already tried but here is the proof that regex does not work: http://i.imgur.com/IgY2v0r.png

Character with codepoint 160 is non-breaking space which is not considered as whitespace so \\\\s will not be able to match it. If you want to replace any kind of spaces (including non-breaking one) and any whitespaces (like tabulators \\t or line breaks \\n \\r ) try with

replaceAll("[\\p{Zs}\\s]+"," ")

From http://www.regular-expressions.info/unicode.html

\\p{Zs} will match any kind of space character


Demo :

char[] arr = { 45, 32, 160, 32, 45, 32, 160, 32, 45, 32, 160 };
String str = new String(arr);
System.out.println("original: \"" + str + "\"");
str = str.replaceAll("[\\p{Zs}\\s]+", " ");
System.out.println("replaced: \"" + str + "\"");

Output:

original: "-   -   -  "
replaced: "- - - "

\\s+ only matches some of the Unicode whitespace characters. If you want to cover all of them , adapt your method to check for any of these characters instead of only spaces.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM