简体   繁体   English

在 Java 中使用 java.util.Collections.sort() 按值对 ConcurrentHashMap 进行排序时出现问题

[英]Problem sorting ConcurrentHashMap by values using java.util.Collections.sort() in Java

I have this code which prints me a list of words sorted by keys (alphabetically) from counts, my ConcurrentHashMap which stores words as keys and their frequencies as values.我有这段代码,它打印出一个按键(按字母顺序)从计数中排序的单词列表,我的 ConcurrentHashMap 将单词存储为键,将它们的频率存储为值。

// Method to create a stopword list with the most frequent words from the lemmas key in the json file
   private static List<String> StopWordsFile(ConcurrentHashMap<String, String> lemmas) {

// counts stores each word and its frequency
       ConcurrentHashMap<String, Integer> counts = new ConcurrentHashMap<String, Integer>();

// corpus is an array list for all the individual words
           ArrayList<String> corpus = new ArrayList<String>();

           for (Entry<String, String> entry : lemmas.entrySet()) {
               
               String line = entry.getValue().toLowerCase();               
               line = line.replaceAll("\\p{Punct}", " ");
               line = line.replaceAll("\\d+"," ");
               line = line.replaceAll("\\s+", " ");
               line = line.trim();
               String[] value = line.split(" ");

               List<String> words = new ArrayList<String>(Arrays.asList(value));
               corpus.addAll(words);

    }

           // count all the words in the corpus and store the words with each frequency i 
           //counts
           for (String word : corpus) {

               if (counts.keySet().contains(word)) {
                   counts.put(word, counts.get(word) + 1);

               } else {counts.put(word, 1);}
}
// Create a list to store all the words with their frequency and sort it by values.
           List<Entry<String, Integer>> list = new ArrayList<>(counts.entrySet());         
           
           List<String> stopwordslist = new ArrayList<>(counts.keySet()); # this works but counts.values() gives an error
           Collections.sort(stopwordslist);
           System.out.println("List after sorting: " +stopwordslist);

So the output is:所以 output 是:

List after sorting: [a, abruptly, absent, abstractmap, accept,...]

How can I sort them by values as well?我如何也可以按值对它们进行排序? when I use List stopwordslist = new ArrayList<>(counts.values());当我使用 List stopwordslist = new ArrayList<>(counts.values());

I get an error,我得到一个错误,

- Cannot infer type arguments for ArrayList<>

I guess that is because ArrayList can store < String > but not <String,Integer> and it gets confused.我猜那是因为 ArrayList 可以存储 <String> 但不能存储 <String,Integer> 并且它会混淆。

I have also tried to do it with a custom Comparator like so:我也尝试过使用这样的自定义比较器来做到这一点:

           Comparator<Entry<String, Integer>> valueComparator = new Comparator<Entry<String,Integer>>() {
               @Override
               public int compare(Entry<String, Integer> e1, Entry<String, Integer> e2) {
                   String v1 = e1.getValue();
                   String v2 = e2.getValue();
                   return v1.compareTo(v2);
               }
           };  
           
           
           List<Entry<String, Integer>> stopwordslist = new ArrayList<Entry<String, Integer>>();
           // sorting HashMap by values using comparator 
           Collections.sort(counts, valueComparator)

which gives me another error,这给了我另一个错误,

The method sort(List<T>, Comparator<? super T>) in the type Collections is not applicable for the arguments (ConcurrentHashMap<String,Integer>, Comparator<Map.Entry<String,Integer>>)

how can I sort my list by values?如何按值对列表进行排序?

my expected output is something like我预期的 output 类似于

[the, of, value, v, key, to, given, a, k, map, in, for, this, returns, if, is, super, null, specified, u, function, and, ...]

Let's go through all the issues of your code让我们通过代码的所有问题 go

  1. Name conventions.命名约定。 Method names should start with a lowercase letter.方法名称应以小写字母开头。

  2. Unnecessary use of ConcurrentHashMap .不必要地使用ConcurrentHashMap For a purely local use like within you method, an ordinary HashMap will do.对于纯粹的本地使用,如您的方法,普通的HashMap就可以了。 For parameters, just use the Map interface, to allow the caller to use whatever Map implementation will fit.对于参数,只需使用Map接口,以允许调用者使用任何适合的Map实现。

  3. Unnecessarily iterating over the entrySet() .不必要地迭代entrySet() When you're only interested in the values, you don't need to use entrySet() and call getValue() on every entry;当您只对值感兴趣时,您不需要使用entrySet()并在每个条目上调用getValue() you can iterate over values() in the first place.您可以首先迭代values() Likewise, you would use keySet() when you're interested in the keys only.同样,当您只对键感兴趣时,您将使用keySet() Only iterate over entrySet() when you need key and value (or want to perform updates).仅当您需要键和值(或想要执行更新)时才迭代entrySet() )。

  4. Don't replace pattern matches by spaces, to split by the spaces afterwards.不要用空格替换模式匹配,然后用空格分割。 Specify the (combined) pattern directly to split , ie line.split("[\\p{Punct}\\d\\s]+") .将(组合)模式直接指定为split ,即line.split("[\\p{Punct}\\d\\s]+")

  5. Don't use List<String> words = new ArrayList<String>(Arrays.asList(value));不要使用List<String> words = new ArrayList<String>(Arrays.asList(value)); unless you specifically need the features of an ArrayList .除非您特别需要ArrayList的功能。 Otherwise, just use List<String> words = Arrays.asList(value);否则,只需使用List<String> words = Arrays.asList(value);
    But when the only thing you're doing with the list, is addAll to another collection, you can use Collections.addAll(corpus, value);但是,当您对列表所做的唯一事情是addAll到另一个集合时,您可以使用Collections.addAll(corpus, value); without the List detour.没有List绕道。

  6. Don't use counts.keySet().contains(word) as you can simply use counts.containsKey(word) .不要使用counts.keySet().contains(word) ,因为您可以简单地使用counts.containsKey(word) But you can simplify the entire但是你可以简化整个

    if (counts.containsKey(word)) { counts.put(word, counts.get(word) + 1); } else {counts.put(word, 1);}

    to

    counts.merge(word, 1, Integer::sum);
  7. The points above yield上面的点产量

    ArrayList<String> corpus = new ArrayList<>(); for(String line: lemmas.values()) { String[] value = line.toLowerCase().trim().split("[\\p{Punct}\\d\\s]+"); Collections.addAll(corpus, value); } for (String word: corpus) { counts.merge(word, 1, Integer::sum); }

    But there is no point in performing two loops, the first only to store everything into a potentially large list, to iterate over it a single time.但是执行两个循环是没有意义的,第一个循环只是将所有内容存储到一个可能很大的列表中,然后对其进行一次迭代。 You can perform the second loop's operation right in the first (resp. only) loop and get rid of the list.您可以在第一个(仅分别)循环中执行第二个循环的操作并摆脱列表。

     for(String line: lemmas.values()) { for(String word: line.toLowerCase().trim().split("[\\p{Punct}\\d\\s]+")) { counts.merge(word, 1, Integer::sum); } }
  8. You already acknowledged that you can't sort a map, by copying the map into a list and sorting the list in your first variant.您已经承认无法通过将 map 复制到列表中并在您的第一个变体中对列表进行排序来对 map 进行排序。 In the second variant, you created a List<Entry<String, Integer>> but then, you didn't use it at all but rather tried to pass the map to sort .在第二个变体中,您创建了一个List<Entry<String, Integer>>但是,您根本没有使用它,而是尝试将 map 传递给sort (By the way, since Java 8, you can invoke sort directly on a List , no need to call Collections.sort ). (顺便说一句,从 Java 8 开始,您可以直接在List上调用sort ,无需调用Collections.sort )。
    You have to keep copying the map data into a list and sorting the list.您必须继续将 map 数据复制到列表中并对列表进行排序。 For example,例如,

     List<Map.Entry<String, Integer>> list = new ArrayList<>(counts.entrySet()); list.sort(Map.Entry.comparingByValue());

    Now, you have to decide whether you change the return type to List<Map.Entry<String, Integer>> or copy the keys of the sorted entries to a new list.现在,您必须决定是将返回类型更改为List<Map.Entry<String, Integer>>还是将已排序条目的键复制到新列表中。

Taking all points together and staying with the original return type, the fixed code looks like将所有点放在一起并保持原始返回类型,固定代码看起来像

private static List<String> stopWordsFile(Map<String, String> lemmas) {
    Map<String, Integer> counts = new HashMap<>();

    for(String line: lemmas.values()) {
        for(String word: line.toLowerCase().trim().split("[\\p{Punct}\\d\\s]+")) {
            counts.merge(word, 1, Integer::sum);
        }
    }

    List<Map.Entry<String, Integer>> list = new ArrayList<>(counts.entrySet());         
    list.sort(Map.Entry.comparingByValue());

    List<String> stopwordslist = new ArrayList<>();
    for(Map.Entry<String, Integer> e: list) stopwordslist.add(e.getKey());

//    System.out.println("List after sorting: " + stopwordslist);
    return stopwordslist;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM