[英]How exactly does String.split() method in Java work when regex is provided?
I'm preparing for OCPJP exam and I ran into the following example: 我正在准备OCPJP考试,我遇到了以下示例:
class Test {
public static void main(String args[]) {
String test = "I am preparing for OCPJP";
String[] tokens = test.split("\\S");
System.out.println(tokens.length);
}
}
This code prints 16. I was expecting something like no_of_characters + 1. Can someone explain me, what does the split() method actually do in this case? 这段代码打印16.我期待像no_of_characters + 1这样的东西。有人可以解释一下,split()方法在这种情况下实际上做了什么? I just don't get it...
我只是不明白......
It splits on every "\\\\S"
which in regex engine represents \\S
non-whitespace character. 它在正则表达式引擎代表
\\S
非空白字符的每个"\\\\S"
上分裂。
So lets try to split "xx"
on non-whitespace ( \\S
). 所以让我们尝试在非空格(
\\S
)上拆分"xx"
。 Since this regex can be matched by one character lets iterate over them to mark places of split (we will use pipe |
for that). 由于此正则表达式可以用一个字符匹配允许在它们之间迭代,以纪念分裂的地方(我们将使用管道
|
为该)。
'x'
non-whitespace? 'x'
非空白? YES, so lets mark it | x
| x
| x
' '
non-whitespace? ' '
非空白? NO, so we leave it as is 'x'
non-whitespace? 'x'
非空白? YES, so lets mark it | |
| |
| |
So as result we need to split our string at start and at end which initially gives us result array 因此,我们需要在开始和结束时拆分我们的字符串,最初给出结果数组
["", " ", ""]
^ ^ - here we split
But since trailing empty strings are removed, result would be 但是由于尾随空字符串被删除,结果将是
[""," "] <- result
,""] <- removed trailing empty string
so split returns array ["", " "]
which contains only two elements. 所以split返回array
["", " "]
,它只包含两个元素。
BTW. BTW。 To turn off removing last empty strings you need to use
split(regex,limit)
with negative value of limit like split("\\\\S",-1)
. 要关闭删除最后一个空字符串,您需要使用
split(regex,limit)
和split(regex,limit)
负值,如split("\\\\S",-1)
。
Now lets get back to your example. 现在让我们回到你的例子。 In case of your data you are splitting on each of
如果您的数据是分裂的每一个
I am preparing for OCPJP
| || ||||||||| ||| |||||
which means 意思是
""|" "|""|" "|""|""|""|""|""|""|""|""|" "|""|""|" "|""|""|""|""|""
So this represents this array 所以这代表了这个数组
[""," ",""," ","","","","","","","",""," ","",""," ","","","","",""]
but since trailing empty strings ""
are removed (if their existence was caused by split - more info at: Confusing output from String.split ) 但由于尾随空字符串
""
被删除(如果它们的存在是由分裂引起的 - 更多信息在: 混淆String.split的输出 )
[""," ",""," ","","","","","","","",""," ","",""," ","","","","",""]
^^ ^^ ^^ ^^ ^^
you are getting as result array which contains only this part: 你得到的结果数组只包含这部分:
[""," ",""," ","","","","","","","",""," ","",""," "]
which are exactly 16 elements. 这正好是16个元素。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.