简体   繁体   中英

Unexpected behavior of Java String split( )

I am trying to split a string using String split function, here's an example:

    String[] list = "   Hello   ".split("\\s+");
    System.out.println("String length: " + list.length);
    for (String s : list) {
        System.out.println("----");
        System.out.println(s);
    }

Here's the output:

String length: 2
----

----
Hello

As you can see, the leading whitespace becoming an empty element in the String array, but the trailing whitespace is not.

Does anyone know why?

You need to use the other split method which specifys the limit and specify a limit of -1

String[] list = "   Hello   ".split("\\s+", -1);

to preserve the trailing whitespace, - the default behavior is to omit the trailing spaces as per the javadoc


Edit ( answer for comment ):

To trim the leading space, you can strip off the leading space before splitting the String

String str = "   Hello   ".replaceAll("^\\s+", "");
String[] list = str.split("\\s+", -1);

From split documentation

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero . Trailing empty strings are therefore not included in the resulting array.

so in reality split(regex) is the same as using

split(regex, 0);

and its documentation says

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n , and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

so if you want to include trailing empty strings will just have to use non-zero value like

split("\\s+",10);

but this will also limit result array to max 10 elements. To get rid of this problem use some negative number like

split("\\s+",-1);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM