简体   繁体   English

为什么在空字符串上“拆分”会返回一个非空数组?

[英]Why does "split" on an empty string return a non-empty array?

Split on an empty string returns an array of size 1:拆分空字符串返回大小为 1 的数组:

scala> "".split(',')
res1: Array[String] = Array("")

Consider that this returns empty array:考虑这将返回空数组:

scala> ",,,,".split(',')
res2: Array[String] = Array()

Please explain:)请解释:)

如果将橙色分成零次,则只有一个橙色。

Splitting an empty string returns the empty string as the first element. 拆分空字符串会将空字符串作为第一个元素返回。 If no delimiter is found in the target string, you will get an array of size 1 that is holding the original string, even if it is empty. 如果在目标字符串中找不到分隔符,则将获得一个大小为1的数组,该数组包含原始字符串,即使它是空的。

The Java and Scala split methods operate in two steps like this: Java和Scala拆分方法分两步运行,如下所示:

  • First, split the string by delimiter. 首先,通过分隔符拆分字符串。 The natural consequence is that if the string does not contain the delimiter, a singleton array containing just the input string is returned, 自然的结果是,如果字符串不包含分隔符,则返回仅包含输入字符串的单个数组,
  • Second, remove all the rightmost empty strings. 其次, 删除所有最右边的空字符串。 This is the reason ",,,".split(",") returns empty array. 这就是",,,".split(",")返回空数组的原因。

According to this, the result of "".split(",") should be an empty array because of the second step, right? 根据这个, "".split(",")应该是一个空数组,因为第二步,对吗?

It should. 这应该。 Unfortunately, this is an artificially introduced corner case. 不幸的是,这是一个人为引入的角落案例。 And that is bad, but at least it is documented in java.util.regex.Pattern , if you remember to take a look at the documentation: 这很糟糕,但至少记录java.util.regex.Pattern ,如果你还记得看看文档:

For n == 0, the result is as for n < 0, except trailing empty strings will not be returned. 对于n == 0,结果与n <0相同,但不会返回尾随空字符串。 (Note that the case where the input is itself an empty string is special, as described above, and the limit parameter does not apply there.) (注意,输入本身是空字符串的情况是特殊的,如上所述,并且limit参数不适用于那里。)

Solution 1: Always pass -1 as the second parameter 解决方案1:始终传递-1作为第二个参数

So, I advise you to always pass n == -1 as the second parameter (this will skip step two above), unless you specifically know what you want to achieve / you are sure that the empty string is not something that your program would get as an input. 所以,我建议你总是传递n == -1作为第二个参数(这将跳过上面的第二步),除非你明确知道你想要实现什么/你确定空字符串不是你的程序会得到一个输入。

Solution 2: Use Guava Splitter class 解决方案2:使用Guava Splitter类

If you are already using Guava in your project, you can try the Splitter (documentation) class. 如果您已在项目中使用Guava,则可以尝试使用Splitter(文档)类。 It has a very rich API, and makes your code very easy to understand. 它有一个非常丰富的API,使您的代码非常容易理解。

Splitter.on(".").split(".a.b.c.") // "", "a", "b", "c", ""
Splitter.on(",").omitEmptyStrings().split("a,,b,,c") // "a", "b", "c"
Splitter.on(CharMatcher.anyOf(",.")).split("a,b.c") // "a", "b", "c"
Splitter.onPattern("=>?").split("a=b=>c") // "a", "b", "c"
Splitter.on(",").limit(2).split("a,b,c") // "a", "b,c"

For the same reason that 出于同样的原因

",test" split ','

and

",test," split ','

will return an array of size 2. Everything before the first match is returned as the first element. 将返回一个大小为2的数组。第一个匹配之前的所有内容都将作为第一个元素返回。

"a".split(",") - > "a"因此"".split(",") - > ""

In all programming languages I know a blank string is still a valid String. 在所有编程语言中,我知道空字符串仍然是有效的字符串。 So doing a split using any delimiter will always return a single element array where that element is the blank String. 因此,使用任何分隔符进行拆分将始终返回单个元素数组,其中该元素是空字符串。 If it was a null (not blank) String then that would be a different issue. 如果它是一个null(非空)字符串,那么这将是一个不同的问题。

This split behavior is inherited from Java, for better or worse... 这种split行为继承自Java,无论好坏......
Scala does not override the definition from the String primitive. Scala不会覆盖String原语中的定义。

Note, that you can use the limit argument to modify the behavior : 注意,您可以使用limit参数来修改行为

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. limit参数控制模式的应用次数,因此会影响结果数组的长度。 If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. 如果限制n大于零,那么模式将最多应用n - 1次,数组的长度将不大于n,并且数组的最后一个条目将包含除最后一个匹配分隔符之外的所有输入。 If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. 如果n是非正数,那么模式将被应用尽可能多的次数,并且数组可以具有任何长度。 If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded. 如果n为零,那么模式将被应用尽可能多的次数,该数组可以具有任何长度,并且将丢弃尾随的空字符串。

ie you can set the limit=-1 to get the behavior of (all?) other languages: 即你可以设置limit=-1来获得(所有?)其他语言的行为:

@ ",a,,b,,".split(",")
res1: Array[String] = Array("", "a", "", "b")

@ ",a,,b,,".split(",", -1)  // limit=-1
res2: Array[String] = Array("", "a", "", "b", "", "")

It's seems to be well-known the Java behavior is quite confusing but: 似乎众所周知Java行为令人困惑但是:

The behavior above can be observed from at least Java 5 to Java 8. 从至少Java 5到Java 8可以观察到上述行为。

There was an attempt to change the behavior to return an empty array when splitting an empty string in JDK-6559590 . JDK-6559590中拆分空字符串时,尝试更改行为以返回空数组。 However, it was soon reverted in JDK-8028321 when it causes regression in various places. 然而,它很快就在JDK-8028321中被还原,当它在各个地方引起回归时。 The change never makes it into the initial Java 8 release. 这一变化从未进入最初的Java 8版本。

Note: The split method wasn't in Java from the beginning (it's not in 1.0.2 ) but actually is there from at least 1.4 (eg see JSR51 circa 2002). 注意:split方法从一开始就不是Java(它不在1.0.2中 ),但实际上至少存在1.4(例如,参见2002年的JSR51 )。 I am still investigating... 我还在调查......

What's unclear is why Java chose this in the first place (my suspicion is that it was originally an oversight/bug in an "edge case"), but now irrevocably baked into the language and so it remains . 目前还不清楚为什么Java首先选择了这个(我怀疑它最初是一个“边缘案例”中的疏忽/错误),但现在已经不可逆转地融入语言中,所以它仍然存在

Empty string have no special status while splitting a string. 拆分字符串时,空字符串没有特殊状态。 You may use: 你可以使用:

Some(str)
  .filter(_ != "")
  .map(_.split(","))
  .getOrElse(Array())

use this Function,使用这个 Function,

public static ArrayList<String> split(String body) {
    return new ArrayList<>(Arrays.asList(Optional.ofNullable(body).filter(a->!a.isEmpty()).orElse(",").split(",")));
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么contains()方法在Java中的非空字符串中找到空字符串 - Why does contains() method find empty string in non-empty string in Java Java - 为什么字符串拆分为空字符串会给我一个非空数组? - Java - Why does string split for empty string give me a non empty array? 为什么非空列表会引发空指针异常? - Why does a non-empty List throw a Null Pointer Exception? 给定一个非空整数数组,返回此数组中的第三个最大数。 如果不存在,则返回最大数。(Java) - Given a non-empty array of integers, return the third maximum number in this array. If it does not exist, return the maximum number.(Java) 在 javac 源代码中,为什么closure(Type) 会为非类/接口类型返回一个非空列表? - In the javac source code, why does closure(Type) return a non-empty list for non-class/interface types? String.split什么时候返回一个空数组? - When does String.split return an empty array? 当我使用 split(&quot;.&quot;) 时,为什么它返回一个空数组? - when i use split(".") , why does it return an empty array? 如何遍历非空字符串? - How to iterate over a non-empty string? Java 正则表达式非空字符串 - Java Regex non-empty string 为什么在 Java 中声明一个非空数组的空数组是合法的? - Why is declaring an empty array of non-empty array(s) legal in Java?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM