[英]How to sort the words by their frequency
I take an input text file, convert it to an array, sort the array, and then get the frequencies of each word. 我获取一个输入文本文件,将其转换为数组,对数组进行排序,然后获取每个单词的频率。 I can't figure out how to sort them according to their frequencies, from highest to lowest, without importing lots of things (which is what I am trying to do): 我无法弄清楚如何根据它们的频率对它们进行排序,从最高到最低,而不会导入很多东西(这是我想要做的):
//find frequencies
int count = 0;
List<String> list = new ArrayList<>();
for(String s:words){
if(!list.contains(s)){
list.add(s);
}
}
for(int i=0;i<list.size();i++){
for(int j=0;j<words.length;j++){
if(list.get(i).equals(words[j])){
count++;
}
}
System.out.println(list.get(i) + "\t" + count);
count=0;
}
This returns the words with their frequencies in an unsorted order, for example: 这将以未排序的顺序返回其频率的单词,例如:
the 3
with 7
he 8
etc. 等等
I want this to be sorted like: 我希望这个排序如下:
he 8
with 7
the 3
I would suggest using a small helper class: 我建议使用一个小助手类:
class WordFreq implements Comparable<WordFreq> {
final String word;
int freq;
@Override public int compareTo(WordFreq that) {
return Integer.compare(this.freq, that.freq);
}
}
Build an array of instances of this class, one for each word, then sort the array using Arrays.sort
. 构建此类的实例数组,每个单词一个,然后使用Arrays.sort
对数组进行Arrays.sort
。
I implemented it like so, 我是这样实现的,
private static class Tuple implements Comparable<Tuple> {
private int count;
private String word;
public Tuple(int count, String word) {
this.count = count;
this.word = word;
}
@Override
public int compareTo(Tuple o) {
return new Integer(this.count).compareTo(o.count);
}
public String toString() {
return word + " " + count;
}
}
public static void main(String[] args) {
String[] words = { "the", "he", "he", "he", "he", "he", "he", "he",
"he", "the", "the", "with", "with", "with", "with", "with",
"with", "with" };
// find frequencies
Arrays.sort(words);
Map<String, Integer> map = new HashMap<String, Integer>();
for (String s : words) {
if (map.containsKey(s)) {
map.put(s, map.get(s) + 1);
} else {
map.put(s, 1);
}
}
List<Tuple> al = new ArrayList<Tuple>();
for (Map.Entry<String, Integer> entry : map.entrySet()) {
al.add(new Tuple(entry.getValue(), entry.getKey()));
}
Collections.sort(al);
System.out.println(al);
}
Output is, 输出是,
[the 3, with 7, he 8]
You should create an object of type Word
that holds the word's String
value and its frequency. 您应该创建一个Word
类型的对象,该对象包含单词的String
值及其频率。
Then you can implement compareTo
or use a Comparator
and call Collections.sort()
on your list of type Word
然后,您可以实现compareTo
或使用Comparator
并在Word
类型列表上调用Collections.sort()
使用Map<String, Integer>
代替将String
作为键存储,频率作为值存储,初始值为1.如果单词已经存在,只需将值增加1.然后将此映射转换为Map<Integer, List<String>>
(或Guava Multimap
)并使用Integer
值作为键,使用String
键将它们存储为值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.