简体   繁体   中英

ContainsIgnoreCase in stream filter to count one particular word occurence in list of String

I want to count a single word occurrence in a List of String in java. Seemingly this task is easy but I met a problem with words which starts by capital letter or contains , or . at the end of the word. My method looks like:

public static Long countWordOccurence(List<String> wordList, String word) {

    return wordList.stream()
        .filter(s -> word.contains(s))
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
        .values()
        .stream()
        .findFirst()
        .orElse((long) -1);
  }

Above code works fine for normal scenario but the problem occurs for a corner case like coma at the end of the string like Test, or a String which starts by a capital letter.

I am splitting my string list like:

Arrays.asList(TEXT_TO_PARSE.split(" ")); 

If it possible I would be grateful to avoid additional dependencies but if it will be necessary I will not despise.

I will be grateful for a suggestion on how to fix my filter clause in a stream to count strings properly.

There are several fundamental problems with your code.

  • .filter(s -> word.contains(s)) performs a substring search. Contrary to your question's title, it does not ignore case. Still, there can be strings of different content passing the filter

  • .collect(Collectors.groupingBy(Function.identity(), Collectors.counting())) creates groups according to the string's actual content. So when multiple different strings passed the previous filter, multiple groups may exist

  • .values().stream().findFirst() : since the groupingBy created a map with an unspecified ordering, this will pick an arbitrary group. Besides that, it's a very inefficient way to ask just for the count()

  • .orElse((long) -1) The value -1 is a very strange fall-back for counting, as the most natural answer would be “zero” when there are no matches.

So a straight-forward solution would be

public static long countWordOccurence(List<String> wordList, String word) {
    return Collections.frequency(wordList, word);
}

for counting case sensitive matches or

public static long countWordOccurence(List<String> wordList, String word) {
    return wordList.stream().filter(word::equalsIgnoreCase).count();
}

for counting case insensitive.

But that's an xy problem anyway.

When you want to count occurrences of a word in a string, it's not necessary to split the string into words and to convert the array into a list (by the way, you can stream over an array directly ), before performing the actual search.

You can use

public static long countWordOccurence(String sentence, String word) {
    if(!word.codePoints().allMatch(Character::isLetter))
        throw new IllegalArgumentException(word+" is not a word");
    Pattern p = Pattern.compile("\\b"+word+"\\b");
    return p.matcher(sentence).results().count();
}

for a count of case sensitive matches and

public static long countWordOccurence(String sentence, String word) {
    if(!word.codePoints().allMatch(Character::isLetter))
        throw new IllegalArgumentException(word+" is not a word");
    Pattern p = Pattern.compile("\\b"+word+"\\b", Pattern.CASE_INSENSITIVE);
    return p.matcher(sentence).results().count();
}

for the case insensitive matches. The \\b pattern denotes word boundaries, which only makes sense if the search string is actually a word. So the methods above have a pre-test for that, which also ensures that the word does not contain characters that could be misinterpreted as regex patterns.

The results() method was introduced in Java 9. This answer shows a solution for creating such a stream under Java 8, however, for such a simple task as counting the occurrences, the alternative would be not to use streams here:

public static long countWordOccurence(String sentence, String word) {
    if(!word.codePoints().allMatch(Character::isLetter))
        throw new IllegalArgumentException(word+" is not a word");
    Pattern p = Pattern.compile("\\b"+word+"\\b", Pattern.CASE_INSENSITIVE);
    int count = 0;
    for(Matcher m = p.matcher(sentence); m.find(); count++) {}
    return count;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM