简体   繁体   中英

Count frequency of each word from list of Strings using Java8

I have two lists of Strings. Need to create a map of occurrences of each string of one list in another list of string. If a String is present even more than in a single string, it should be counted as one occurrence.

For example:

String[] listA={"the", "you" , "how"}; 
String[] listB = {"the dog ate the food", "how is the weather" , "how are you"};

The Map<String, Integer> map will take keys as Strings from listA , and value as the occurence. So map will have key-values as: ("the",2)("you",1)("how",2) .

Note: Though "the" is repeated twice in "the dog ate the food" , it counted as only one occurrence as it is in the same string.

How do I write this using ? I tried this approach but does not work:

Set<String> sentenceSet = Stream.of(listB).collect(Collectors.toSet());

Map<String, Long> frequency1 =  Stream.of(listA)
    .filter(e -> sentenceSet.contains(e))
    .collect(Collectors.groupingBy(t -> t, Collectors.counting()));

You need to extract all the words from listB and keep only these that are also listed in listA . Then you simply collect the pairs word -> count to the Map<String, Long> :

String[] listA={"the", "you", "how"};
String[] listB = {"the dog ate the food", "how is the weather" , "how are you"};

Set<String> qualified = new HashSet<>(Arrays.asList(listA));   // make searching easier

Map<String, Long> map = Arrays.stream(listB)   // stream the sentences
    .map(sentence -> sentence.split("\\s+"))   // split by words to Stream<String[]>
    .flatMap(words -> Arrays.stream(words)     // flatmap to Stream<String>
                            .distinct())       // ... as distinct words by sentence
    .filter(qualified::contains)               // keep only the qualified words
    .collect(Collectors.groupingBy(            // collect to the Map
        Function.identity(),                   // ... the key is the words itself
        Collectors.counting()));               // ... the value is its frequency

Output:

{the=2, how=2, you=1}

Suggest you create a hash table of the items in the first string. Then loop through the items in the second list checking if it is in the hash table or not. When adding the elements in the first list, test to see if it's already there and decide if you want to keep a count or not. You can store which sentence a word is in as the value for the key, for instance.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM