简体   繁体   English

为什么Java String.split()留在空字符串后面?

[英]Why does java String.split() leave behind empty strings?

When I use the String.split() method, how come sometimes I get empty strings? 当我使用String.split()方法时,为什么有时会得到空字符串? For example, if I do: 例如,如果我这样做:

"(something)".split("\\W+")  

Then the first element of the return value will be an empty string. 然后,返回值的第一个元素将是一个空字符串。 Also, the example from the documentation (as seen here ) doesn't make sense either. 另外,从文档的例子(如看到这里 )没有意义无论是。

Regex          Result

  :    { "boo", "and", "foo" }}
  o    { "b", "", ":and:f" }}

How come the ":" is used as the delimiter, there are no empty strings, but with "o" there are? 为什么将“:”用作分隔符,没有空字符串,但是带有“ o”呢?

With: 附:

"(something)".split("\\W+")

it's assuming the delimiter comes between fields, so what you end up with is: 假设分隔符位于字段之间 ,那么最终得到的是:

""   "something"   ""    <- fields
   (             )       <- delimiters

You could fix that by trimming the string first to remove any leading or trailing delimiters, something like: 您可以通过先修剪字符串以删除任何前导或尾随定界符来解决此问题,例如:

"(something)".replaceAll("^\\W*","").replaceAll("\\W*$","").split("\\W+")

With something like: 用类似的东西:

"boo:and:foo".split("o", 0)

you'll get: 你会得到:

"b"   ""   ":and:f"   <- fields
    o    o            <- delimiters

because you have consecutive delimiters (which don't exists when the delimiter is ":" ) which are deemed therefore to have an empty field between them. 因为您有连续的定界符(当定界符为":"时不存在),因此认为它们之间有一个空字段。

And the reason you don't have trailing blank fields because of foo at the end, has to do with that limit of zero. 而且,由于foo的末尾而没有尾随空白字段的原因与该限制为零有关。 In that case, trailing (not leading) empty fields are removed. 在这种情况下,尾随的(不是前导的)空白字段将被删除。

If you want to also get rid of the empty fields in the middle, you can instead use "o+" as the delimiter since that will greedily absorb consective o characters into a single delimiter. 如果您还想摆脱中间的空字段,则可以改用"o+"作为分隔符,因为这样会将贪婪的o字符吸收到单个分隔符中。 You can also use the replaceAll trick shown above to get rid of leading empty fields. 您还可以使用上面显示的replaceAll技巧来消除前导的空字段。

Actually the reason is not in which delimiter you choose, in the latter case you have two o s following one by one. 实际上,原因不是您选择哪个定界符,在后一种情况下,您必须一个接一个地跟随两个o And what is between them? 他们之间是什么? The empty string is. 空字符串是。

Maybe it's contrintuitive in the beginning and you might think it would be better to skip empty strings. 也许一开始是有启发性的,您可能会认为跳过空字符串会更好。 But there are two very popular formats to store data in text file. 但是,有两种非常流行的格式可以将数据存储在文本文件中。 Tab separated values and comma separated values. 制表符分隔的值和逗号分隔的值。

Let's imagine that you want to store information about people in format name,surname,age . 假设您要以name,surname,age格式存储有关人的信息。 For example Peter,Green,12 . 例如Peter,Green,12 But what if you want to store information about the guy whose surname you don't know. 但是,如果您想存储不知道其姓氏的人的信息,该怎么办。 It should look like Mike,,13 . 它看起来应该像Mike,,13 Then if you split by comma you get 'Mike', '', '13' and you know that the first element is name, the second is empty surname and the third is age. 然后,如果用逗号分隔,则得到'Mike', '', '13'并且您知道第一个元素是name,第二个元素是空姓,第三个元素是age。 But if you choose to skip empty strings then you'll get 'Mike', '13' . 但是,如果您选择跳过空字符串,则会得到'Mike', '13' And you cannot understand which field is missing. 而且您无法理解缺少哪个字段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM