简体   繁体   English

比较HashMap和Value中的键

[英]Comparing keys in HashMap and Values

I have a HashMap as follows- 我有一个HashMap如下-

HashMap<String, Integer> BC = new HashMap<String, Integer>();

which stores as keys- "tokens/tages" and as values- "frequency of each tokens/tags". 它存储为键-“令牌/年龄”和值-“每个令牌/标签的频率”。

Example- 例-

"the/at" 153
"that/cs" 45
"Ann/np" 3

I now parse through each key and check whether for same token say "the" whether it's associated with more than one tag and then take the largest of the two. 现在,我解析每个键,并检查同一令牌是否说“ the”是否与多个标签相关联,然后取两个标签中最大的一个。

Example- 例-

"the/at" 153
"the/det" 80

Then I take the key- "the/at" with value - 153 . 然后,我将键"the/at"的值153

The code that I have written to do so is as follows- 我编写的代码如下:

private HashMap<String, Integer> Unigram_Tagger = new HashMap<String, Integer>();

for(String curr_key: BC.keySet())
        {
            for(String next_key: BC.keySet())
            {
                if(curr_key.equals(next_key))
                    continue;
                else
                {
                    String[] split_key_curr_key = curr_key.split("/");
                    String[] split_key_next_key = next_key.split("/");

                    //out.println("CK- " + curr_key + ", NK- " + next_key);

                    if(split_key_curr_key[0].equals(split_key_next_key[0]))
                    {
                        int ck_v = 0, nk_v = 0;
                        ck_v = BC.get(curr_key);
                        nk_v = BC.get(next_key);

                        if(ck_v > nk_v)
                            Unigram_Tagger.put(curr_key, BC.get(curr_key));
                        else
                            Unigram_Tagger.put(next_key, BC.get(next_key));
                    }
                }
            }
        }

But this code is taking too long to compute since the original HashMap 'BC' has 68442 entries which comes approximately to its square = 4684307364 times (plus some more). 但是此代码的计算时间太长,因为原始的HashMap'BC'具有68442个条目,大约等于其平方= 4684307364倍(加上更多)。

My question is this- can I accomplish the same output using a more efficient method? 我的问题是-我可以使用更有效的方法完成相同的输出吗?

Thanks! 谢谢!

Create a new 创建一个新的

Map<String,Integer> highCount = new HashMap<>();

that will map tokens to their largest count. 会将令牌映射到最大数量。

Make a single pass through the keys. 一次通过按键。

Split each key into its component tokens. 将每个密钥分成其组件令牌。

For each token, look in highMap . 对于每个令牌,请查看highMap If the key does not exist, add it with its count. 如果密钥不存在,请添加其数量。 If the entry already exists and the current count is greater than the previous maximum, replace the maximum in the map. 如果条目已经存在并且当前计数大于先前的最大值,请替换映射中的最大值。

When you are done with the single pass the highCount will contain all the unique tokens along with the highest count seen for each token. 完成单遍操作后, highCount将包含所有唯一令牌以及每个令牌看到的最高计数。

Note: This answer is intended to give you a starting point from which to develop a complete solution. 注意:此答案旨在为您提供一个起点,以开发一个完整的解决方案。 The key concept is that you create and populate a new map from token to some "value" type (not necessarily just Integer ) that provides you with the functionality you need. 关键概念是创建并填充从令牌到某种“值”类型(不一定只是Integer )的新映射,该映射可为您提供所需的功能。 Most likely the value type will be a new custom class that stores the tag and the count. 值类型很可能是一个新的自定义类,用于存储标记和计数。

The slowest part of your current method is due to the pairwise comparison of keys. 当前方法最慢的部分是由于密钥的成对比较。 First, define a Tuple class: 首先,定义一个Tuple类:

public class Tuple<X, Y> { 
  public final X x; 
  public final Y y; 
  public Tuple(X x, Y y) { 
    this.x = x; 
    this.y = y; 
  } 
} 

Thus you can try an algorithm that does: 因此,您可以尝试执行以下操作的算法:

  1. Initializes a new HashMap<String, Tuple<String, Integer>> result 初始化新的HashMap<String, Tuple<String, Integer>> result
  2. Given input pair (key, value) from the old map, where key = "a/b" , check whether result.keySet().contains(a) and result.keySet().contains(b) . 给定旧地图中的输入对(key, value) ,其中key = "a/b" ,请检查result.keySet().contains(a)result.keySet().contains(b)
  3. If both a and b is not present, result.put(a, new Tuple<String, Integer>(b, value) and result.put(b, new Tuple<String, Integer>(a, value)) 如果ab都不同时存在,则result.put(a, new Tuple<String, Integer>(b, value)result.put(b, new Tuple<String, Integer>(a, value))
  4. If a is present, compare value and v = result.get(a) . 如果a存在时,比较valuev = result.get(a) If value > v , remove a and b from result and do step 3. Do the same for b . 如果value > v ,则从result删除ab并执行步骤3。对b进行相同操作。 Otherwise, get the next key-value pair. 否则,获取下一个键值对。

After you have iterated through the old hash map and inserted everything, then you can easily reconstruct the output you want by transforming the key-values in result . 遍历旧的哈希映射并插入所有内容之后,可以通过转换result的键值轻松地重建所需的输出。

A basic thought on the algorithm: 关于算法的基本思想:

  1. You should get the entrySet() of the HashMap and convert it to a List: 您应该获取HashMap的entrySet()并将其转换为List:

     ArrayList<Map.Entry<String, Integer>> list = new ArrayList<>(map.entrySet()); 
  2. Now you should sort the list by the keys in alphabetical order. 现在,您应该按字母顺序对列表进行排序。 We do that because the HashMap has no order, so you can expect that the corresponding keys might be far apart. 我们这样做是因为HashMap没有顺序,因此您可以期望相应的键可能相距很远。 But by sorting them, all related keys are directly next to each other. 但是通过对它们进行排序,所有相关的键都直接相邻。

     Collections.sort(list, Comparator.comparing(e -> e.getKey())); 

    The entries "the/at" and "the/det" will be next to each other, thanks to sorting alphabetically. 由于按字母顺序排序,条目“ the / at”和“ the / det”将彼此相邻。

  3. Now you can iterate over the entire list while remembering the best item, until you find a better one or you find the first item which has not the same prefix (eg "the"). 现在,您可以在记住最佳项目的同时遍历整个列表,直到找到一个更好的项目,或者找到前缀不相同的第一个项目(例如“ the”)。

     ArrayList<Map.Entry<String, Integer>> bestList = new ArrayList<>(); // The first entry of the list is considered the currently best item for it's group Map.Entry<String, Integer> currentBest = best.get(0); String key = currentBest.getKey(); String currentPrefix = key.substring(0, key.indexOf('/')); for (int i=1; i<list.size(); i++) { // The item we compare the current best with Map.Entry<String, Integer> next = list.get(i); String nkey = next.getKey(); String nextPrefix = nkey.substring(0, nkey.indexOf('/')); // If both items have the same prefix, then we want to keep the best one // as the current best item if (currentPrefix.equals(nextPrefix)) { if (currentBest.getValue() < next.getValue()) { currentBest = next; } // If the prefix is different we add the current best to the best list and // consider the current item the best one for the next group } else { bestList.add(currentBest); currentBest = next; currentPrefix = nextPrefix; } } // The last one must be added here, or we would forget it bestList.add(currentBest); 
  4. Now you should have a list of Map.Entry objects representing the desired entries. 现在,您应该具有一个代表所需条目的Map.Entry对象列表。 The complexity should be n(log n) and is limited by the sorting algorithm, while grouping/collection the items has a complexity of n. 复杂度应为n(log n),并受排序算法限制,而分组/收集项的复杂度为n。

import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.TreeMap;
import java.util.stream.Collectors;

public class Point {

    public static void main(String[] args) {
        HashMap<String, Integer> BC = new HashMap<>();
        //some random values
        BC.put("the/at",5);
        BC.put("Ann/npe",6);
        BC.put("the/atx",7);
        BC.put("that/cs",8);
        BC.put("the/aty",9);
        BC.put("Ann/np",1);
        BC.put("Ann/npq",2);
        BC.put("the/atz",3);
        BC.put("Ann/npz",4);
        BC.put("the/atq",0);
        BC.put("the/atw",12);
        BC.put("that/cs",14);
        BC.put("that/cs1",16);
        BC.put("the/at1",18);
        BC.put("the/at2",100);
        BC.put("the/at3",123);
        BC.put("that/det",153);  
        BC.put("xyx",123);
        BC.put("xyx/w",2);  
        System.out.println("\nUnsorted Map......");
        printMap(BC); 

        System.out.println("\nSorted Map......By Key"); 
        //sort original map using TreeMap, it will sort the Map by keys automatically.
        Map<String, Integer> sortedBC = new TreeMap<>(BC);
        printMap(sortedBC);
        //  find all distinct prefixes by spliting the keys at "/"
        List<String> uniquePrefixes = sortedBC.keySet().stream().map(i->i.split("/")[0]).distinct().collect(Collectors.toList());
        System.out.println("\nuniquePrefixes: "+uniquePrefixes);        

        TreeMap<String,Integer> mapOfMaxValues = new TreeMap<>();
        // for each prefix from the list above filter the entries from the sorted map 
        // having keys starting with this prefix 
        //and sort them by value in descending order and get the first which will have the highst value
        uniquePrefixes.stream().forEach(i->{ 
                Entry <String,Integer> e = 
                sortedBC.entrySet().stream().filter(j->j.getKey().startsWith(i))
                .sorted(Map.Entry.comparingByValue(Comparator.reverseOrder())).findFirst().get();

                mapOfMaxValues.put(e.getKey(), e.getValue());
            });

        System.out.println("\nmapOfMaxValues...\n");
        printMap(mapOfMaxValues);  
    }
    //pretty print a map
    public static <K, V> void printMap(Map<K, V> map) {
        map.entrySet().stream().forEach((entry) -> {
            System.out.println("Key : " + entry.getKey()
                    + " Value : " + entry.getValue());
        });
    }
}

// note: only tested with random values provided in the code 
// behavior for large maps untested

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM