简体   繁体   English

Java字符串split()的异常行为

[英]Unexpected behavior of Java String split( )

I am trying to split a string using String split function, here's an example: 我正在尝试使用String split函数拆分字符串,这是一个示例:

    String[] list = "   Hello   ".split("\\s+");
    System.out.println("String length: " + list.length);
    for (String s : list) {
        System.out.println("----");
        System.out.println(s);
    }

Here's the output: 这是输出:

String length: 2
----

----
Hello

As you can see, the leading whitespace becoming an empty element in the String array, but the trailing whitespace is not. 如您所见,前导空格在String数组中成为空元素,但尾随空格不是。

Does anyone know why? 有人知道为什么吗?

You need to use the other split method which specifys the limit and specify a limit of -1 您需要使用另一种拆分方法 ,该方法指定限制并将限制指定为-1

String[] list = "   Hello   ".split("\\s+", -1);

to preserve the trailing whitespace, - the default behavior is to omit the trailing spaces as per the javadoc 保留尾随空格-默认行为是按照javadoc省略尾随空格


Edit ( answer for comment ): 编辑( 评论的答案 ):

To trim the leading space, you can strip off the leading space before splitting the String 要修剪前导空间,您可以在分割String之前先去除前导空间

String str = "   Hello   ".replaceAll("^\\s+", "");
String[] list = str.split("\\s+", -1);

From split documentation 来自拆分文档

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero . 该方法的工作方式就像通过调用具有给定表达式且限制参数为0的二参数拆分方法 Trailing empty strings are therefore not included in the resulting array. 因此,结尾的空字符串不包括在结果数组中。

so in reality split(regex) is the same as using 所以实际上split(regex)与使用相同

split(regex, 0);

and its documentation says 它的文档

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. limit参数控制应用图案的次数,因此会影响所得数组的长度。 If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n , and the array's last entry will contain all input beyond the last matched delimiter. 如果限制n大于零,则将最多应用n-1次该模式,该数组的长度将不大于n ,并且该数组的最后一个条目将包含除最后一个匹配的定界符之外的所有输入。 If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. 如果n为非正数,则该模式将被尽可能多地应用,并且数组可以具有任何长度。 If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded. 如果n为零,则该模式将被尽可能多地应用,该数组可以具有任何长度,并且尾随的空字符串将被丢弃。

so if you want to include trailing empty strings will just have to use non-zero value like 因此,如果要包含尾随空字符串,则只需使用非零值,例如

split("\\s+",10);

but this will also limit result array to max 10 elements. 但这也将结果数组限制为最多10个元素。 To get rid of this problem use some negative number like 要解决此问题,请使用一些负数,例如

split("\\s+",-1);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM