[英]How to see the distribution of keys in a HashMap?
When using a hash map, it's important to evenly distribute the keys over the buckets. 使用哈希映射时,将密钥均匀分布在存储桶上非常重要。
If all keys end up in the same bucket, you essentially end up with a list. 如果所有密钥都在同一个存储桶中,那么您最终会得到一个列表。
Is there a way to "audit" a HashMap in Java in order to see how well the keys are distributed? 有没有办法在Java中“审核”HashMap以查看密钥的分发情况?
I tried subtyping it and iterating Entry<K,V>[] table
, but it's not visible. 我尝试对其进行子类型化并迭代
Entry<K,V>[] table
,但它不可见。
I tried subtyping it and iterating Entry[] table, but it's not visible
我尝试对它进行子类型化并迭代Entry []表,但它不可见
Use Reflection API! 使用Reflection API!
public class Main {
//This is to simulate instances which are not equal but go to the same bucket.
static class A {
@Override
public boolean equals(Object obj) { return false;}
@Override
public int hashCode() {return 42; }
}
public static void main(String[] args) {
//Test data
HashMap<A, String> map = new HashMap<A, String>(4);
map.put(new A(), "abc");
map.put(new A(), "def");
//Access to the internal table
Class clazz = map.getClass();
Field table = clazz.getDeclaredField("table");
table.setAccessible(true);
Map.Entry<Integer, String>[] realTable = (Map.Entry<Integer, String>[]) table.get(map);
//Iterate and do pretty printing
for (int i = 0; i < realTable.length; i++) {
System.out.println(String.format("Bucket : %d, Entry: %s", i, bucketToString(realTable[i])));
}
}
private static String bucketToString(Map.Entry<Integer, String> entry) throws Exception {
if (entry == null) return null;
StringBuilder sb = new StringBuilder();
//Access to the "next" filed of HashMap$Node
Class clazz = entry.getClass();
Field next = clazz.getDeclaredField("next");
next.setAccessible(true);
//going through the bucket
while (entry != null) {
sb.append(entry);
entry = (Map.Entry<Integer, String>) next.get(entry);
if (null != entry) sb.append(" -> ");
}
return sb.toString();
}
}
In the end you'll see something like this in STDOUT: 最后你会在STDOUT中看到类似的东西:
Bucket : 0, Entry: null
Bucket : 1, Entry: null
Bucket : 2, Entry: Main$A@2a=abc -> Main$A@2a=def
Bucket : 3, Entry: null
HashMap
uses the keys produced by the hashCode()
method of your key objects, so I guess you are really asking how evenly distributed those hash code values are. HashMap
使用密钥对象的hashCode()
方法生成的密钥,所以我猜你真的在问这些哈希代码值是如何均匀分布的。 You can get hold of the key objects using Map.keySet()
. 您可以使用
Map.keySet()
获取关键对象。
Now, the OpenJDK and Oracle implementations of HashMap
do not use the key hash codes directly, but apply another hashing function to the provided hashes before distributing them over the buckets. 现在,
HashMap
的OpenJDK和Oracle实现不直接使用密钥哈希码,而是在将它们分配到存储桶之前对提供的哈希应用另一个哈希函数。 But you should not rely on or use this implementation detail. 但是你不应该依赖或使用这个实现细节。 So you ought to ignore it.
所以你应该忽略它。 So you should just ensure that the
hashCode()
methods of your key values are well distributed. 因此,您应该确保键值的
hashCode()
方法分布均匀。
Examining the actual hash codes of some sample key value objects is unlikely to tell you anything useful unless your hash cide method is very poor. 检查某些示例键值对象的实际哈希码不太可能告诉您任何有用的内容,除非您的哈希值方法非常差。 You would be better doing a basic theoretical analysis of your hash code method.
您最好对哈希码方法进行基本的理论分析。 This is not as scary as it might sound.
这并不像听起来那么可怕。 You may (indeed, have no choice but to do so) assume that the hash code methods of the supplied Java classes are well distributed.
您可能(实际上别无选择)假设所提供的Java类的哈希代码方法分布均匀。 Then you just need a check that the means you use for combining the hash codes for your data members behaves well for the expected values of your data members.
然后,您只需要检查用于组合数据成员的哈希码的方法是否适合数据成员的预期值。 Only if your data members have values that are highly correlated in a peculiar way is this likely to be a problem.
只有当您的数据成员具有以特殊方式高度相关的值时,这可能是一个问题。
You can use reflection to access the hidden fields: 您可以使用反射来访问隐藏字段:
HashMap map = ...;
// get the HashMap#table field
Field tableField = HashMap.class.getDeclaredField("table");
tableField.setAccessible(true);
Object[] table = (Object[]) tableField.get(map);
int[] counts = new int[table.length];
// get the HashMap.Node#next field
Class<?> entryClass = table.getClass().getComponentType();
Field nextField = entryClass.getDeclaredField("next");
nextField.setAccessible(true);
for (int i = 0; i < table.length; i++) {
Object e = table[i];
int count = 0;
if (e != null) {
do {
count++;
} while ((e = nextField.get(e)) != null);
}
counts[i] = count;
}
Now you have an array of the entry counts for each bucket. 现在,您有一个每个存储桶的条目数。
Client.java Client.java
public class Client{
public static void main(String[] args) {
Map<Example, Number> m = new HashMap<>();
Example e1 = new Example(100); //point 1
Example e2 = new Example(200); //point2
Example e3 = new Example(300); //point3
m.put(e1, 10);
m.put(e2, 20);
m.put(e3, 30);
System.out.println(m);//point4
}
}
Example.java Example.java
public class Example {
int s;
Example(int s) {
this.s =s;
}
@Override
public int hashCode() {
// TODO Auto-generated method stub
return 5;
}
}
Now at point 1, point 2 and point 3 in Client.java, we are inserting 3 keys of type Example in hashmap m. 现在在Client.java中的第1点,第2点和第3点,我们在hashmap m中插入3个类型为Example的键。 Since hashcode() is overridden in Example.java, all three keys e1,e2,e3 will return same hashcode and hence same bucket in hashmap.
由于在Example.java中重写了hashcode(),因此所有三个键e1,e2,e3将返回相同的哈希码,因此在hashmap中返回相同的桶。
Now the problem is how to see the distribution of keys. 现在的问题是如何查看密钥的分配。
Approach : 方法:
table
array element(in this case it will be 5), key
correspond to e1, while value
correspond to 10 (point1) table
数组元素(在本例中为5), key
对应e1, value
对应10(point1) next
variable here points to next node of Linked list ie next Entry object which is (e2, 200) in our case. next
变量指向Linked list的下一个节点,即下一个Entry对象,在我们的例子中是(e2,200)。 So in this way you can inspect the hashmap. 因此,您可以通过这种方式检查hashmap。
Also i would recommend you to go through internal implementation of hashmap to understand HashMap by heart. 另外我建议你仔细阅读hashmap的内部实现来理解HashMap。
Hope it helped.. 希望它有所帮助..
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.