简体   繁体   中英

How to print just the top 10 words and their frequency from a HashMap?

I have a problem with my code. I am reading in a file with some text, and then counting the words and their frequency. I am then trying to print out the top 10 most frequently used words in the text.

However, with the approaches that I have tried (putting for loops set to stop after 10 iterations for example), I get the words printed back in the same fashion just 10 times. Otherwise, I am able to print ALL the words in the file with their frequency.I would just require the top-10 most frequently used ones only.

ArrayList<Integer> values = new ArrayList<>();
    values.addAll(wordcount.values());
    Collections.sort(values, Collections.reverseOrder());
    int last_i = -1;
    for(Integer i: values) {
        if (last_i == i)
            continue;
        last_i = i;
        System.out.println("The top 10 words are: ");
       // for (int count = 0; count < 10; count++) {
            for (String s : wordcount.keySet())
                if (wordcount.get(s) == i)
                    System.out.println(s + " : " + i);
           }
       }
}

Please find above the "problematic" code. I am using a

BufferedReader

to read in the text File and then removing all punctuation and stuff that may cause the same word to appear as two different entries in my HashMap.

Any help in greatly appreciated. Thanks!

Here is a Java 8 example with lambdas:

        Map<String, Integer> wordcount = new HashMap<>();
        wordcount.put("two", 20);
        wordcount.put("five", 50);
        wordcount.put("three", 30);
        wordcount.put("four", 40);
        wordcount.put("one", 10);
        wordcount.put("six", 60);
        wordcount.put("eight", 80);
        wordcount.put("twelve", 1);
        wordcount.put("nine", 90);
        wordcount.put("ten", 100);
        wordcount.put("seven", 70);
        wordcount.put("eleven", 1);
        wordcount.put("15", 1);
        wordcount.put("13", 2);
        wordcount.put("16", 4);
        wordcount.put("14", 3);
        wordcount.entrySet()
                .stream()
                .sorted(Map.Entry.comparingByValue(Collections.reverseOrder()))
                .limit(10)
                .collect(Collectors.toMap(
                        Map.Entry::getKey,
                        Map.Entry::getValue,
                        (e1, e2) -> e1,
                        LinkedHashMap::new
                )).forEach((s, integer) -> System.out.println(String.format("%s : %s", s, integer)));

Should print something like:

ten : 100
nine : 90
eight : 80
seven : 70
six : 60
five : 50
four : 40
three : 30
two : 20
one : 10

First you need to count how many times each word is repeated in text with help of Map, after that sort map entries in reverse order in result of this most frequent words will be in the start of the collection and print first ten elements from this collection.

public void printTopTenWordsByFrequencyFrom(List<String> text) {
    Map<String, Integer> map = new HashMap<>();
    for(String word : text) {
        Integer times = map.get(word);
        if(times == null) {
            map.put(word, 1);
        } else {
            map.put(word, times + 1);
        }
    }
    map.entrySet().stream()
                  .sorted((one, another) -> - one.getValue().compareTo(another.getValue())) //sort entries to reverse order  
                  .limit(10) 
                  .forEach(entry -> entry.getKey() + " : " + entry.getValue());
}

Version without streams and lambdas:

  public void printTopTenWordsByFrequencyFrom(List<String> text) {
        Map<String, Integer> map = new HashMap<>();
        for(String word : text) {
            Integer times = map.get(word);
            if(times == null) {
                map.put(word, 1);
            } else {
                map.put(word, times + 1);
            }
        }
        List<Map.Entry<String, Integer>> statistics = new ArrayList<>(map.entrySet());
        Collections.sort(statistics, new Comparator<Map.Entry<String, Integer> {

            int compare(Map.Entry<String, Integer> one, Map.Entry<String, Integer> another) {
                return - one.getValue().compareTo(another.getValue()); 
            }
        });
        List<Map.Entry<String, Integer>> topTen = statistics.sublist(0, 9);
        for(Map.Entry<String, Integer> word : topTen) {
            System.out.println(word.getKey() + " : " + word.getValue());
        }
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM