简体   繁体   English

当提供正则表达式时,Java中的String.split()方法究竟是如何工作的?

[英]How exactly does String.split() method in Java work when regex is provided?

I'm preparing for OCPJP exam and I ran into the following example: 我正在准备OCPJP考试,我遇到了以下示例:

class Test {
   public static void main(String args[]) {
      String test = "I am preparing for OCPJP";
      String[] tokens = test.split("\\S");
      System.out.println(tokens.length);
   }
}

This code prints 16. I was expecting something like no_of_characters + 1. Can someone explain me, what does the split() method actually do in this case? 这段代码打印16.我期待像no_of_characters + 1这样的东西。有人可以解释一下,split()方法在这种情况下实际上做了什么? I just don't get it... 我只是不明白......

It splits on every "\\\\S" which in regex engine represents \\S non-whitespace character. 它在正则表达式引擎代表\\S非空白字符的每个"\\\\S"上分裂。

So lets try to split "xx" on non-whitespace ( \\S ). 所以让我们尝试在非空格( \\S )上拆分"xx" Since this regex can be matched by one character lets iterate over them to mark places of split (we will use pipe | for that). 由于此正则表达式可以用一个字符匹配允许在它们之间迭代,以纪念分裂的地方(我们将使用管道|为该)。

  • is 'x' non-whitespace? 'x'非空白? YES, so lets mark it | x YES,所以让我们将其标记| x | x
  • is ' ' non-whitespace? ' '非空白? NO, so we leave it as is 不,所以我们保持原样
  • is last 'x' non-whitespace? 是最后'x'非空白? YES, so lets mark it | | YES,所以让我们将其标记| | | |

So as result we need to split our string at start and at end which initially gives us result array 因此,我们需要在开始和结束时拆分我们的字符串,最初给出结果数组

["", " ", ""]
   ^    ^ - here we split

But since trailing empty strings are removed, result would be 但是由于尾随空字符串被删除,结果将是

[""," "]     <- result
        ,""] <- removed trailing empty string

so split returns array ["", " "] which contains only two elements. 所以split返回array ["", " "] ,它只包含两个元素。

BTW. BTW。 To turn off removing last empty strings you need to use split(regex,limit) with negative value of limit like split("\\\\S",-1) . 要关闭删除最后一个空字符串,您需要使用split(regex,limit)split(regex,limit)负值,如split("\\\\S",-1)


Now lets get back to your example. 现在让我们回到你的例子。 In case of your data you are splitting on each of 如果您的数据是分裂的每一个

I am preparing for OCPJP
| || ||||||||| ||| |||||

which means 意思是

 ""|" "|""|" "|""|""|""|""|""|""|""|""|" "|""|""|" "|""|""|""|""|""

So this represents this array 所以这代表了这个数组

[""," ",""," ","","","","","","","",""," ","",""," ","","","","",""]  

but since trailing empty strings "" are removed (if their existence was caused by split - more info at: Confusing output from String.split ) 但由于尾随空字符串""被删除(如果它们的存在是由分裂引起的 - 更多信息在: 混淆String.split的输出

[""," ",""," ","","","","","","","",""," ","",""," ","","","","",""]  
                                                     ^^ ^^ ^^ ^^ ^^

you are getting as result array which contains only this part: 你得到的结果数组只包含这部分:

[""," ",""," ","","","","","","","",""," ","",""," "]  

which are exactly 16 elements. 这正好是16个元素。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM