简体   繁体   English

字符串类拆分功能正在返回恶劣的结果

[英]String class split funtion is returning abigious results

below are 2 codes 以下是2个代码

System.out.println(",,,,,".split(",").length);

and

System.out.println(",,,,, ".split(",").length);

for first code the result it prints is 0 and for the second code result it prints is 6. 对于第一个代码,它打印的结果是0,而对于第二个代码结果,它打印的是6。

My question is why the split is not able to recognize "," when I am not adding extra space at the end but it is able to recognize it when I add extra space at the end. 我的问题是为什么分裂无法识别“,”当我没有在末尾添加额外空间但是当我在末尾添加额外空间时能够识别它。

Please note I have tried it with regex "\\\\s*,\\\\s" also but result is same. 请注意我已经尝试使用正则表达式“\\\\ s *,\\\\ s”,但结果是一样的。

I don't have a doc reference for this, but empirically what I saw in my testing of String#split is that if there are no actual text matches, then zero-width matches are also not returned in the array. 我没有这方面的文档参考,但根据经验,我在测试String#split看到的是,如果没有实际的文本匹配,那么数组中也不会返回零宽度匹配。 So the following returns an empty array: 所以以下返回一个空数组:

",,,,,".split(",")

However, if you add a space to the end of the series of commas and then do the same split, then there is a single matching space. 但是,如果在逗号系列的末尾添加空格然后执行相同的拆分,则会有一个匹配空间。 As a result of this, the array comes back with all matches, including zero-width matches: 因此,数组返回所有匹配项,包括零宽度匹配:

",,,,, ".split(",")

But, because there is no content in between the commas, I would interpret your real requirement as wanting to split each individual comma into a separate result. 但是,因为逗号之间没有内容,我会将您的实际要求解释为希望将每个逗号分成单独的结果。 If so, then you can split using lookarounds, something like this: 如果是这样,那么你可以使用lookarounds进行拆分,如下所示:

String input = ",,,,,";
String[] parts = input.split("(?<=,)(?=,)");
for (String part : parts) {
    System.out.println(part);
}

This outputs: 这输出:

,
,
,
,
,

Demo 演示

split() in java by default removes trailing empty strings from result array. java中的split()默认情况下从结果数组中删除尾随的空字符串。 To keep empty, you can use split(delimiter, limit) with limit set to negative value, like this 要保持为空,可以使用split(delimiter, limit)并将limit设置为负值,如下所示

System.out.println(",,,,," .split(",", -1).length);

Let's explore more see the interesting results of split below: 让我们更多地了解下面拆分的有趣结果:

System.out.println(",,,,,,".split(",").length); // 0
System.out.println(",,,,,, ".split(",").length); // 7
System.out.println(",,, ,,,".split(",").length); // 4
System.out.println(" ,,,,,,".split(",").length); // 1

Wondering if why it's happening this is because below statement stated for the split method in docs: 想知道为什么会发生这种情况是因为下面的语句中的分割方法声明如下:

Trailing empty strings are therefore not included in the resulting array. 因此,结尾的空字符串不包含在结果数组中。

Docs: https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String) 文档: https//docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)

if you don't want the split method to remove that spaces then you should use another split with limit: 如果您不希望split方法删除该空格,那么您应该使用另一个具有limit的拆分:

public String[] split(String regex,int limit)

Docs: https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String,%20int) 文档: https//docs.oracle.com/javase/7/docs/api/java/lang/String.html#split( java.lang.String,% 20int)

Example: 例:

System.out.println(",,,,,,".split(",",-1).length); // 7
System.out.println(",,,,,, ".split(",",-1).length); // 7
System.out.println(",,, ,,,".split(",",-1).length); // 7
System.out.println(" ,,,,,,".split(",",-1).length); // 7

Forget documentation, I directly looked into the code and found the following piece of code in java.lang.String#split(java.lang.String, int) :- 忘记文档,我直接查看代码并在java.lang.String#split(java.lang.String, int)找到以下代码java.lang.String#split(java.lang.String, int) : -

while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
    resultSize--;
}

This proves that it is designed to remove the last element if it is empty . 这证明它被设计为在最后一个元素为空时删除它 And keep doing it until the last element is not zero-length. 并继续这样做,直到最后一个元素不是零长度。

This feature is useful, for example, if you have a string a,b, it should return a and b in the resulting array and not the last blank character '' after the last comma. 此功能很有用,例如,如果你有一个字符串a,b,它应该在结果数组中返回ab ,而不是在最后一个逗号后面的最后一个空白字符''

If you do System.out.println(", ,,,".split(",").length); 如果你做System.out.println(", ,,,".split(",").length); it will return 2 because the above while loop will keep decreasing the result from the right side until it finds something whose length is non-zero. 它将返回2因为上面的while循环将继续减少右侧的结果,直到找到长度非零的东西。

The above while loop is enclosed in if (limit == 0) . 上面的while循环包含在if (limit == 0) So if you want to count all, use a non-zero limit. 因此,如果您想要全部计算,请使用非零限制。 If you don't want any limit, use a negative number like -1. 如果您不想要任何限制,请使用负数,如-1。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM