简体   繁体   中英

How to split a String with two delimiters and keep only one of of them?

I want to split a String in punctuation marks and white spaces, but keep the punctuation marks. Ex

String example = "How are you? I am fine!"

I want to have as a result

["How","are","you","?","I","am","fine","!"]

but instead I get

["how"," ","are"," ","you"," ","?"," ","i"," ","am"," ","fine"," ","!"].

what I used was example.toLowerCase().trim().split("(?<=\\\\b|[^\\\\p{L}])");

Why are you doing toLowerCase() ? This already messes up your expected result. And why the trim() on the full string?

Doing this with a single split call is probably not too simple.

An alternative would be to just filter out the unwanted entries:

String example = "How are you? I am fine!";

Pattern pattern = Pattern.compile("\\b");
String[] result = pattern.splitAsStream(example)
    .filter(Predicate.not(String::isBlank))
    .toArray(String[]::new);

System.out.println(Arrays.toString(result));

Output:

[How, are, you, ? , I, am, fine, !]

Reacting to your comment of wanting [How,are,you,?,I,am,fine,!] as output; simply dont print with Arrays.toString but build the string yourself manually. The array does not contain any whitespaces.

System.out.println("[" + String.join(",", result) + "]");

You can do it as follows:

import java.util.Arrays;

public class Main {
    public static void main(String[] args) {
        String example = "How are you? I am fine!";
        String[] arr = example.split("\\s+|\\b(?=\\p{Punct})");
        System.out.println(Arrays.toString(arr));
    }
}

Output:

[How, are, you, ?, I, am, fine, !]

Explanation of the regex:

  1. \\\\s+ specifies the space
  2. \\\\b specifies the word boundary
  3. (?=\\\\p{Punct}) specifies the positive look ahead for punctuation.
  4. | specifies the alternation ( OR )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM