简体   繁体   English

高效的方法是使用少量映射到相同值的键来反转哈希图

[英]Efficient way invert to invert a hashmap with a small number of the keys mapping to same values

I have a hashmap that that I know that some keys map to the same values. 我有一个哈希图,我知道有些键映射到相同的值。
The number of these keys is very small (less than 6%) and they map between 2-4 values. 这些键的数量非常小(小于6%),并且它们映射在2-4个值之间。
Eg 例如

Map<String, String> map = new HashMap<>();  
map.put("codeA", "100");  
map.put("codeB", "7");  
map.put("codeC", "0012");   

I need to create an inverse of this map from the values to the keys so I did: 我需要从值到键创建此映射的逆,所以我这样做了:

inverseMap = new HashMap<String, ArrayList<String>>();
for(Map.Entry<String, String> e:map.entrySet()) {
    String code = e.getKey();
    String val = e.getValue();
    ArrayList<String> codesColliding = inverseMap.get(val);
    if(codesColliding == null) {
        codesColliding = new ArrayList<>(4);
        inverseMap.put(val, codesColliding);
    }
    codesColliding.add(code);
}  

This works but I think it is suboptimal as I am using more memory than needed for the vast majority of the keys. 这行得通,但我认为它不是最佳选择,因为我使用的内存比绝大多数键所需的内存更多。
Although from coding perspective it works I was wondering if this can be approached differently (via other data structures?) 尽管从编码角度来看它有效,但我想知道是否可以通过其他方式(通过其他数据结构)来实现这一点。
Note: I am interested in plain Java 7 (no extra libs) approaches 注意:我对纯Java 7(无额外的库)方法感兴趣

If the values of the inverse map need to be able to accommodate multiple keys from the original map, then there is no avoiding some overhead relative to the case when they do not need to be so accommodating. 如果逆映射的值需要能够容纳原始映射中的多个键,则相对于不需要这样的情况,就无法避免一些开销。 Your current approach isn't bad, but if so small a percentage of the original map's values are duplicated, and none are duplicated more than a handful of times, then I'd be even more stingy with the initial capacities of the lists you use as values in the inverse map. 您当前的方法还不错,但是如果原始地图值的很小一部分被重复,并且没有重复多次,那么对于您使用的列表的初始容量,我会更加怯st作为逆映射中的值。 Why pre-allocate any more than one element? 为什么要预分配不止一个元素? You'll rarely need to re-allocate, but when you do, the list will handle it transparently to you. 您几乎不需要重新分配,但是当您这样做时,列表将对您透明地进行处理。

Maybe the easiest approach is to create a class that has two HashMaps, one for non colliding keys, the other for keys that collide. 也许最简单的方法是创建一个包含两个HashMap的类,一个用于非碰撞键,另一个用于碰撞键。 If you disambiguate the collisions in a certain way (eg, you always pick the first one alphabetically) you can add that logic into the class. 如果您以某种方式消除冲突的歧义(例如,您始终按字母顺序选择第一个),则可以将该逻辑添加到类中。 Or you can lazily wrap non colliding Strings into an ArrayList, if you want to return ArrayLists. 或者,如果您想返回ArrayLists,则可以将非冲突的字符串懒惰地包装到ArrayList中。

It's all about knowing what you want to do with the Map. 这全都在于了解您要对地图执行的操作。 You can even sacrifice some type safety if you are confident your code can handle disambiguating between String and ArrayList results. 如果您确信代码可以处理String和ArrayList结果之间的歧义,甚至可以牺牲一些类型安全性。

I know you're talking about a Map<String,String> , but just for clarity let's generalize it to Map<K,V> , from which you're building a Map<V,Collection<K>> . 我知道您在谈论Map<String,String> ,但为清楚起见,让我们将其概括为Map<K,V> ,从中您将在其中构建Map<V,Collection<K>> Add another Map<V,K> , maybe call it uniqueInverseMap . 添加另一个Map<V,K> ,也许将其uniqueInverseMap As you scan through the entries, always check for a key first in inverseMap , then uniqueInverseMap . 扫描条目时,请始终先在inverseMap检查密钥,然后再uniqueInverseMap If it's already in uniqueInverseMap , remove it, create a new two-element list, add the list to inverseMap . 如果已经在uniqueInverseMap ,请将其删除,创建一个新的两元素列表,然后将该列表添加到inverseMap

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM