I want to split text into words, to count the number of words.
This is how imagine it to be:
int words = text.split("[\\p{Punct}*\\p{Space}*]").length;
I've tried multiple combinations, but it seems to split into too manu parts, for example
"word1 word2"
...has 8 words with this regex, I want it to be only 2.
int countWords(String input) {
return input.trim().split("\\s+").length;
}
A word is just text surrounded by whitespace. Parsing words from a String
can be done by calling String.split()
using "\\\\s+"
as the delimiter.
Note that "\\\\s+"
is a regular expression. It matches strings which consist of at least one whitespace character (such as a space, a tab, or a newline).
int words = text.trim().split("\\s+").length;
Try the following regex:
[\\p{Punct}\\p{Space}]+
The problem with your current regex is that it matches exactly one character, and thus separately matches each whitespace between word1
and word2
. The repetition operator placed outside the character group fixes that.
Use Guava , define a Splitter as Constant:
private static final Splitter WORD_SPLITTER =
Splitter.on(CharMatcher.JAVA_LETTER_OR_DIGIT.negate())
.trimResults()
.omitEmptyStrings();
and use it in your code:
int words = Iterables.size(WORD_SPLITTER.split(yourString));
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.