I want to split a String in punctuation marks and white spaces, but keep the punctuation marks. Ex
String example = "How are you? I am fine!"
I want to have as a result
["How","are","you","?","I","am","fine","!"]
but instead I get
["how"," ","are"," ","you"," ","?"," ","i"," ","am"," ","fine"," ","!"].
what I used was example.toLowerCase().trim().split("(?<=\\\\b|[^\\\\p{L}])");
Why are you doing toLowerCase()
? This already messes up your expected result. And why the trim()
on the full string?
Doing this with a single split
call is probably not too simple.
An alternative would be to just filter out the unwanted entries:
String example = "How are you? I am fine!";
Pattern pattern = Pattern.compile("\\b");
String[] result = pattern.splitAsStream(example)
.filter(Predicate.not(String::isBlank))
.toArray(String[]::new);
System.out.println(Arrays.toString(result));
Output:
[How, are, you, ? , I, am, fine, !]
Reacting to your comment of wanting [How,are,you,?,I,am,fine,!]
as output; simply dont print with Arrays.toString
but build the string yourself manually. The array does not contain any whitespaces.
System.out.println("[" + String.join(",", result) + "]");
You can do it as follows:
import java.util.Arrays;
public class Main {
public static void main(String[] args) {
String example = "How are you? I am fine!";
String[] arr = example.split("\\s+|\\b(?=\\p{Punct})");
System.out.println(Arrays.toString(arr));
}
}
Output:
[How, are, you, ?, I, am, fine, !]
Explanation of the regex:
\\\\s+
specifies the space \\\\b
specifies the word boundary (?=\\\\p{Punct})
specifies the positive look ahead for punctuation. |
specifies the alternation ( OR
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.