简体   繁体   English

如何按频率对单词进行排序

[英]How to sort the words by their frequency

I take an input text file, convert it to an array, sort the array, and then get the frequencies of each word. 我获取一个输入文本文件,将其转换为数组,对数组进行排序,然后获取每个单词的频率。 I can't figure out how to sort them according to their frequencies, from highest to lowest, without importing lots of things (which is what I am trying to do): 我无法弄清楚如何根据它们的频率对它们进行排序,从最高到最低,而不会导入很多东西(这是我想要做的):

//find frequencies
    int count = 0;
    List<String> list = new ArrayList<>();
    for(String s:words){
        if(!list.contains(s)){
            list.add(s);
        }
    }
    for(int i=0;i<list.size();i++){
        for(int j=0;j<words.length;j++){
            if(list.get(i).equals(words[j])){
                count++;
            }
        }

        System.out.println(list.get(i) + "\t" + count);
        count=0;
    }

This returns the words with their frequencies in an unsorted order, for example: 这将以未排序的顺序返回其频率的单词,例如:

the 3
with 7
he 8

etc. 等等

I want this to be sorted like: 我希望这个排序如下:

he 8
with 7
the 3

I would suggest using a small helper class: 我建议使用一个小助手类:

class WordFreq implements Comparable<WordFreq> {
   final String word;
   int freq;
   @Override public int compareTo(WordFreq that) {
     return Integer.compare(this.freq, that.freq);
   }
}

Build an array of instances of this class, one for each word, then sort the array using Arrays.sort . 构建此类的实例数组,每个单词一个,然后使用Arrays.sort对数组进行Arrays.sort

I implemented it like so, 我是这样实现的,

private static class Tuple implements Comparable<Tuple> {
    private int count;
    private String word;

    public Tuple(int count, String word) {
        this.count = count;
        this.word = word;
    }

    @Override
    public int compareTo(Tuple o) {
        return new Integer(this.count).compareTo(o.count);
    }
    public String toString() {
        return word + " " + count;
    }
}

public static void main(String[] args) {
    String[] words = { "the", "he", "he", "he", "he", "he", "he", "he",
            "he", "the", "the", "with", "with", "with", "with", "with",
            "with", "with" };
    // find frequencies
    Arrays.sort(words);
    Map<String, Integer> map = new HashMap<String, Integer>();
    for (String s : words) {
        if (map.containsKey(s)) {
            map.put(s, map.get(s) + 1);
        } else {
            map.put(s, 1);
        }
    }
    List<Tuple> al = new ArrayList<Tuple>();
    for (Map.Entry<String, Integer> entry : map.entrySet()) {
        al.add(new Tuple(entry.getValue(), entry.getKey()));
    }
    Collections.sort(al);
    System.out.println(al);
}

Output is, 输出是,

[the 3, with 7, he 8]

You should create an object of type Word that holds the word's String value and its frequency. 您应该创建一个Word类型的对象,该对象包含单词的String值及其频率。

Then you can implement compareTo or use a Comparator and call Collections.sort() on your list of type Word 然后,您可以实现compareTo或使用Comparator并在Word类型列表上调用Collections.sort()

使用Map<String, Integer>代替将String作为键存储,频率作为值存储,初始值为1.如果单词已经存在,只需将值增加1.然后将此映射转换为Map<Integer, List<String>> (或Guava Multimap )并使用Integer值作为键,使用String键将它们存储为值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM