简体   繁体   English

如何比较具有不同键的HashMap?

[英]how can I compare HashMaps with different keys?

I have a clustering algorithm storing the clusters in HashMap<String, ArrayList<String>> . 我有一个将簇存储在HashMap<String, ArrayList<String>>的聚类算法。

I need to compare it with the ground truth which is stored in another HashMap<String, ArrayList<String>> . 我需要将其与存储在另一个HashMap<String, ArrayList<String>>的基本事实进行比较。

The keys are not the same, because the array is created by incremental clustering, so I was wondering how can I compare the original clusters with the generated ones. 密钥不同,因为数组是通过增量集群创建的,所以我想知道如何将原始集群与生成的集群进行比较。

I'm using NMI and BCUBED as clustering evaluation measures, but my problem is how to refer to the same cluster (ArrayList) if they have different keys.. 我使用NMI和BCUBED作为聚类评估指标,但是我的问题是,如果它们具有不同的键,则如何引用相同的聚类(ArrayList)。

Any ideas? 有任何想法吗?

I'm not 100% clear on how your class is set up and how the HashMap<String, ArrayList<String>> is really meant to work, but my inclination is that you should have a Hash of your keys. 对于您的类的设置方式以及HashMap<String, ArrayList<String>>的真正含义,我还不是100%清楚,但是我的倾向是您应该拥有一个键哈希。 So as you're assigning clusters you could do something like this. 因此,在分配群集时,您可以执行以下操作。

Original Data: 原始数据:

Hashmap<String, ArrayList<String>> = key: Array Of Original Data

You would store what is in each cluster as a function of: 您将根据以下功能存储每个群集中的内容:

HashMap<String, ArrayList<String>> = Cluster Key: Array of Original Data Keys

That would give you a mechanism to cycle through the objects that are in each cluster and still maintain the state of the original object, does that make sense? 这将为您提供一种机制,以循环浏览每个群集中的对象并仍保持原始对象的状态,这有意义吗? You'd ultimately then be able to write something akin to: 您最终将能够写出类似于以下内容的内容:

for(String clusterKey : clusterMap.keySet()){
    for(String itemKey : clusterMap.get(clusterKey)){
       calculateDistance(centroid, originalMap.get(itemKey);
    }
}

That is a gross oversimplification, but should get you going in the right direction. 这是一个过分的简化,但是应该使您朝正确的方向前进。

edit I also assumer there is a HashMap of the centroids to the clusters themselves. 编辑我还假设有一个质心到簇本身的HashMap。 So it the calculateDistance() method could be rewritted as calculateDistance(centroidMap.get(clusterKey), originalMap.get(itemKey)); 因此, calculateDistance()方法可以改写为calculateDistance(centroidMap.get(clusterKey), originalMap.get(itemKey));

I guess you can create a reverse HashMap in which the KeyObj is a new object with an ArrayList and a counter. 我猜您可以创建一个反向HashMap,其中KeyObj是带有ArrayList和计数器的新对象。

In the reverse HashMap, the counter of the key will be the number of equal ArrayLists. 在反向HashMap中,键的计数器将是相等的ArrayLists的数量。

Now the comparing algorithm is easy: Iterate through the values of the first HashMap and search for the value as a key of the new HashMap. 现在,比较算法很容易:遍历第一个HashMap的值,并搜索该值作为新HashMap的键。 If the key was found and the counter is 0, or the key was not found, return false. 如果找到密钥并且计数器为0,或者找不到密钥,则返回false。 Else, do count-- and continue to the next iteration. 否则,请进行计数-并继续进行下一个迭代。 At the end, return true. 最后,返回true。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM