简体   繁体   English

Java-最适合查找最频繁元素的数据结构

[英]Java - Most suitable data structure for finding the most frequent element

My program contains algorithms that output text (String). 我的程序包含输出文本(字符串)的算法。 Eventually I want to print out the word that occurred the most. 最终,我想打印出出现次数最多的单词。 But before I do this, I need to store it in a data structure. 但是在执行此操作之前,我需要将其存储在数据结构中。 So I was wondering what data structure is the best (easy and efficient) to store Strings and then be able to obtain the most frequent element? 所以我想知道哪种数据结构是最好的(简单而有效的)存储字符串,然后能够获得最频繁的元素? I don't want to use any libraries. 我不想使用任何库。 Thanks 谢谢

I don't think any data structure does exactly this but here is how I would do it. 我不认为任何数据结构都能做到这一点,但是我将按照以下方式进行操作。

Maintain a Map<String, Integer> of each word to the number of times it was encountered and as you update the map keep track of the string corresponding to the largest number stored. 保持每个单词的Map<String, Integer>到遇到该单词的次数,并在更新地图时跟踪与存储的最大数字相对应的字符串。 For example: 例如:

String maxWord = null;
Integer maxCount = -1;
Map<String, Integer> wordCount = new HashMap<String, Integer>();
for (String str : getMyProgramOutput()) {
  if (!wordCount.containsKey(str)) { wordCount.put(str, 0); }
  int count = wordCount.get(str) + 1;
  if (count > maxCount) {
    maxWord = str;
    maxCount = count;
  }
  wordCount.put(str, count);
}

Create a Map<String, Integer> . 创建一个Map<String, Integer> Every time you enter a String increment the Integer (you might have to create your own MutableInteger class. When you're finished search it (or keep a running count) 每次您输入String增量时,都会使用Integer (您可能必须创建自己的MutableInteger类。完成搜索后(或保持运行计数))

Why don't you just build a max heap where in each node will have the String and integer_occurrence . 为什么不建立一个最大堆,每个节点中都有Stringinteger_occurrence To get the most frequent word, get the root of the heap 要获得最常用的单词,请获取堆的根

you might want to consider using dictionary in DB. 您可能要考虑在数据库中使用字典。 Because such data normally has to be persisted into physical media to prevent from losing after system reboot. 因为通常必须将此类数据保留在物理介质中,以防止系统重新启动后丢失。 In this case, dictionary is helpful. 在这种情况下,字典是有帮助的。 Only thing you need to do is to set up a dictionary table and other table(s) for storing information like frequency and positioning and etc. 您只需要做的就是建立字典表和其他表来存储频率和位置等信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM