简体   繁体   中英

Java .split method matches empty string weird behaviour

I wanted to get list of numbers from sequence of characters(that is: letters and digits). So I've written this code:

class A {
  public static void main(String[] args) {
    String msg = "aa811b22";
    String[] numbers = msg.split("\\D+");
    for (int i = 0; i < numbers.length; i++) {
      System.out.println(">" + numbers[i] + "<");
    }

  }
}

Surpassingly it runs...:

 $ java A
><
>811<
>22<

Ok, so somehow it matched empty string...I explained to myself that "" (empty string) actually matches regexp of NON DIGIT MATCHER so \\D+ . Nothing is NOT digit...right? (however... why it returned only 1 empty string? There is infinite (∞) number of empty string inside any string)

To ensure myself I tried to extract words from string given above:

class A {
  public static void main(String[] args) {
    String msg = "aa811b22";
    String[] words = msg.split("\\d+");
    for (int i = 0; i < words.length; i++) {
      System.out.println(">" + words[i] + "<");
    }

  }
}

which actually prints what I expected (no empty strings returned):

 $ run A
>aa<
>b<

but... I did few more tests that completely confused me:

System.out.println("a".split("\\D+").length);
#=> 0 (WHY NOT 1? Empty string shouldn't be here?!)
System.out.println("a1".split("\\D+").length);
#=> 2 (So now it splits to empty string and 1)
System.out.println("1a".split("\\D+").length);
#=> 1 (now it returns expected "a" string)

So my questions are:

  • Why split returns empty string with my given examples?
  • Why "a".split("\\\\D+").length returns 0 ?
  • why "a1".split("\\\\D+").length is 2 (but no one)
  • how "1a".split("\\\\D+").length) varies from "a1".split("\\\\D+").length) in case of splitting?
  • Why split returns empty string with my given examples?

'a' is not a digit, so aa is a separator. There are elements to return on either side of a separator, and the empty string is to the left of a . If the separator were "," , then out of the string ",a,b" you would expect 3 elements -- "" , "a" , and "b" . Here, aa is the separator, just like , in my example.

  • Why "a".split("\\\\D+").length returns 0 ?

'a' is not a digit, so it's a separator. The presence of the separator means that there are two substring split out of the original String , both empty strings, on either side of the a . However, the no-arg split method discards trailing empty strings. They're all empty, so they're all discarded, and the length is 0 .

  • why "a1".split("\\\\D+").length is 2 (but not one)

Only trailing empty strings are discarded, so the elements are "" and "1" .

  • how "1a".split("\\\\D+").length varies from "a1".split("\\\\D+").length in case of splitting?

"1a" will have one trailing empty string discarded, but "a1" will not have a trailing empty string discarded (it's leading).

It's not matching an empty string. Rather, it's matching the "aa" at the beginning of your string as a delimiter. The first element is empty because there is only an empty string before the first delimiter. In contrast, for trailing delimiters there is no empty string returned, as mentioned in the documentation for split() :

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM