简体   繁体   中英

foo.split(',').length != number of ',' found in 'foo'?

Maybe it's because it's end of day on a Friday, and I have already found a work-around, but this is killing me.

I am using Java but am .NET developer.

I have a string and I need to split it on semicolon comma. Let's say its a row in a CSV file who has 200 210 columns. line.split(',').length will be sometimes, 199, where count of ',' will be 208 OR 209. I find count in 2 different ways even to be sure (using a regex, then manually looping through and checking the character after losing my sanity).

What's the super-obvious-hit-face-on-desk thing I'm missing here? Why isn't foo.split(delim).length == CountOfOccurences(foo,delim) all the time, only sometimes?

thanks much

First, there's an obvious difference of one. If there are 200 columns, all with text, there are 199 commas. Second, Java drops trailing empty strings by default. You can change this by passing a negative number as the second argument.

"foo,,bar,baz,,".split(",")

is:

{foo,,bar,baz}

an array of 4 elements. But

"foo,,bar,baz,,".split(",", -1)

is::

{foo,,bar,baz,,}

with all 6.

Note that only trailing empty strings are dropped by default.

Finally, don't forget that the String is compiled into a regex. This is not be applicable here, since , is not a special character, but you should keep it in mind.

There are a couple things happening. First, if you have three items like a,b,c and split on comma, you'll have three entries, one more than the number of commas.

But what you're dealing with probably comes from consecutive delimiters. : a,,,,b,c,,,,,

The ones at the end get dropped. Check the java documentation for the split function. http://download.java.net/jdk7/docs/api/java/lang/String.html

As others have pointed out, String.split has some very non-intuitive behaviour.

If you're using Google's Guava open-source Java library , there's a Splitter class which gives a much nicer (in my opinion) API for this, with more flexibility:

String input = "foo, bar,";

Splitter.on(',').split(input);
// returns "foo", " bar", ""

Splitter.on(',').omitEmptyStrings().split(input);
// returns "foo", " bar"

Splitter.on(',').omitEmptyStrings().trimResults().split(input);
// returns "foo", "bar"

Short example: foo = "1,2" and

foo.split(",").length = 2
 count(foo, ",") = 1

Probably you have a mistake in your code. Here is an example in Java code:

       String row = "1,2,3,4,,5"; //  second example: 1,2,3,5,,    
         System.out.println(row.split(",").length); // print 6 in both cases


       // code to count how many , you have in your row
       Pattern patter = Pattern.compile(",");
       Matcher m = patter.matcher(row);


       int nr = 0;
       while(m.find())
       {
                  nr++;

       }
       System.out.println(nr); // print 5 for the first example and 6 for second

Is it omitting blanks?

Do you have something like "a,b,c,,d,e" or trailing delimiters like "a,b,c,,,,"?

Are there extra delimiters in the cell data?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM