简体   繁体   中英

String.split vs StringTokenizer on efficiency level

I work on a large scale data set and as of that I am interested in the most efficient way to split a String .

Well I found that Scanner vs. StringTokenizer vs. String.Split and that string tokenizer in Java which pretty much state that I should not use StringTokenizer .

I was convinced not to use it until I checked the @Neil Coffey's experiment chart in the second post Performance of string tokenisation: String.split() and StringTokenizer compared where StringTokenizer is notably faster.

So my question is I should not use a class because it's legacy (as it's officially stated) or should I go for it instead? I must admit that efficiency is crucial enough in my project. String.split shouldn't be at least comparably fast?

Is there any other fast string split alternative?

There is an efficient & more feature rich string splitting methods are available in Google Guava library .

Guava's split method

Ex:

Iterable<String> splitted = Splitter.on(',')
    .omitEmptyStrings()
    .trimResults()
    .split("one,two,,   ,three");

for (String text : splitted) {
  System.out.println(text);
}

Output:

one
two
three

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM