简体   繁体   中英

String splitting on 3 or more words

I have a code that will split 2 words in a string and put them in a array.

String words = "chill hit donkey chicken car roast pink rat tree";

into

[chill hit, donkey chicken, car roast, pink rat, tree]

This is my code for that:

  String[] result = joined.split("(?<!\\G\\S+)\\s");
  System.out.printf("%s%n", Arrays.toString(result));

Now, how do I modify the regex so that it will split into 3 or more words?

Output(3 word in an array):

 [chill hit donkey, chicken car roast, pink rat tree]

Output(4 word in an array):

[chill hit donkey chicken, car roast pink rat tree]

Tried to modify regex but nothing had worked this far. Thanks.

You can use this regex(using re.find() )

((?:\w+\s){2}(?:\w+)) (Replace `2` with `3` for 4 words)

Regex Demo

Java Code

String line = "chill hit donkey chicken car roast pink rat tree";
String pattern = "((?:\\w+\\s){2}(?:\\w+))";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);

while (m.find()) {
    System.out.println(m.group(1));
}

Ideone Demo

for the splitting the text to group of N we can use this

((?:\\w+\\s){N-1}(?:\\w+)) where for group of 2 items you use ((?:\\w+\\s){1}(?:\\w+))

and for group of 3 items use ((?:\\w+\\s){2}(?:\\w+)) and so on.

Here is one another find() version – just change {3} to whatever number you like.

Regex demo

// ((?:\w+\W?){3})(?:(\W+|$))
String text = "chill hit donkey chicken car roast pink rat tree";
String regex = "((?:\\w+\\W?){3})(?:(\\W+|$))";
Matcher m = Pattern.compile(regex).matcher(text);
while (m.find()) {
    System.out.println(String.format("'%s'", m.group(1)));
}

Ideone.com

Out

'chill hit donkey'
'chicken car roast'
'pink rat tree'

Just add the appropriate additional number of "nonwhitespace+whitespace" combinations:

joined.split("(?<!\\\\G\\\\S+\\\\s+\\\\S+)\\\\s");

You can group the \\S+\\s+ together if they get larger than this...`

joined.split("(?<!\\\\G(\\\\S+\\\\s+){2}\\\\S+)\\\\s"); for 4 words, etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM