简体   繁体   中英

java split string with regex

I want to split string by setting all non-alphabet as separator.

String[] word_list = line.split("[^a-zA-Z]");

But with the following input

11:11 Hello World

word_list contains many empty string before "hello" and "world"

Please kindly tell me why. Thank You.

Here's your string, where each ^ character shows a match for [^a-zA-Z] :

11:11 Hello World
^^^^^^     ^

The split method finds each of these matches, and basically returns all substrings between the ^ characters. Since there's six matches before any useful data, you end up with 5 empty substrings before you get the string "Hello" .

To prevent this, you can manually filter the result to ignore any empty strings.

Because your regular expression matches each individual non-alpha character. It would be like separating

",,,,,,Hello,World"

on commas.

You will want an expression that matches an entire sequence of non-alpha characters at once such as:

line.split("[^a-zA-Z][^a-zA-Z]*")

I still think you will get one leading empty string with your example since it would be like separating ",Hello,World" if comma were your separator.

Will the following do?

String[] word_list = line.replaceAll("[^a-zA-Z ]","").replaceAll(" +", " ").trim().split("[^a-zA-Z]");

What I am doing here is removing all non-alphabet characters before doing the split and then replacing multiple spaces by a single space.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM