简体   繁体   English

如何在HashMap中查看键的分布?

[英]How to see the distribution of keys in a HashMap?

When using a hash map, it's important to evenly distribute the keys over the buckets. 使用哈希映射时,将密钥均匀分布在存储桶上非常重要。

If all keys end up in the same bucket, you essentially end up with a list. 如果所有密钥都在同一个存储桶中,那么您最终会得到一个列表。

Is there a way to "audit" a HashMap in Java in order to see how well the keys are distributed? 有没有办法在Java中“审核”HashMap以查看密钥的分发情况?

I tried subtyping it and iterating Entry<K,V>[] table , but it's not visible. 我尝试对其进行子类型化并迭代Entry<K,V>[] table ,但它不可见。

I tried subtyping it and iterating Entry[] table, but it's not visible 我尝试对它进行子类型化并迭代Entry []表,但它不可见

Use Reflection API! 使用Reflection API!

public class Main {
    //This is to simulate instances which are not equal but go to the same bucket.
    static class A {
            @Override
            public boolean equals(Object obj) { return false;}

            @Override
            public int hashCode() {return 42; }
        }

    public static void main(String[] args) {
            //Test data  
            HashMap<A, String> map = new HashMap<A, String>(4);
            map.put(new A(), "abc");
            map.put(new A(), "def");

            //Access to the internal table  
            Class clazz = map.getClass();
            Field table = clazz.getDeclaredField("table");
            table.setAccessible(true);
            Map.Entry<Integer, String>[] realTable = (Map.Entry<Integer, String>[]) table.get(map);

            //Iterate and do pretty printing
            for (int i = 0; i < realTable.length; i++) {
                System.out.println(String.format("Bucket : %d, Entry: %s", i, bucketToString(realTable[i])));
            }
    }

    private static String bucketToString(Map.Entry<Integer, String> entry) throws Exception {
            if (entry == null) return null;
            StringBuilder sb = new StringBuilder();

            //Access to the "next" filed of HashMap$Node
            Class clazz = entry.getClass();
            Field next = clazz.getDeclaredField("next");
            next.setAccessible(true); 

            //going through the bucket
            while (entry != null) {
                sb.append(entry);
                entry = (Map.Entry<Integer, String>) next.get(entry);
                if (null != entry) sb.append(" -> ");
            }
            return sb.toString();
        }
}

In the end you'll see something like this in STDOUT: 最后你会在STDOUT中看到类似的东西:

 Bucket : 0, Entry: null 
 Bucket : 1, Entry: null 
 Bucket : 2, Entry: Main$A@2a=abc -> Main$A@2a=def 
 Bucket : 3, Entry: null

HashMap uses the keys produced by the hashCode() method of your key objects, so I guess you are really asking how evenly distributed those hash code values are. HashMap使用密钥对象的hashCode()方法生成的密钥,所以我猜你真的在问这些哈希代码值是如何均匀分布的。 You can get hold of the key objects using Map.keySet() . 您可以使用Map.keySet()获取关键对象。

Now, the OpenJDK and Oracle implementations of HashMap do not use the key hash codes directly, but apply another hashing function to the provided hashes before distributing them over the buckets. 现在, HashMap的OpenJDK和Oracle实现不直接使用密钥哈希码,而是在将它们分配到存储桶之前对提供的哈希应用另一个哈希函数。 But you should not rely on or use this implementation detail. 但是你不应该依赖或使用这个实现细节。 So you ought to ignore it. 所以你应该忽略它。 So you should just ensure that the hashCode() methods of your key values are well distributed. 因此,您应该确保键值的hashCode()方法分布均匀。

Examining the actual hash codes of some sample key value objects is unlikely to tell you anything useful unless your hash cide method is very poor. 检查某些示例键值对象的实际哈希码不太可能告诉您任何有用的内容,除非您的哈希值方法非常差。 You would be better doing a basic theoretical analysis of your hash code method. 您最好对哈希码方法进行基本的理论分析。 This is not as scary as it might sound. 这并不像听起来那么可怕。 You may (indeed, have no choice but to do so) assume that the hash code methods of the supplied Java classes are well distributed. 您可能(实际上别无选择)假设所提供的Java类的哈希代码方法分布均匀。 Then you just need a check that the means you use for combining the hash codes for your data members behaves well for the expected values of your data members. 然后,您只需要检查用于组合数据成员的哈希码的方法是否适合数据成员的预期值。 Only if your data members have values that are highly correlated in a peculiar way is this likely to be a problem. 只有当您的数据成员具有以特殊方式高度相关的值时,这可能是一个问题。

You can use reflection to access the hidden fields: 您可以使用反射来访问隐藏字段:

HashMap map = ...;

// get the HashMap#table field
Field tableField = HashMap.class.getDeclaredField("table");
tableField.setAccessible(true);

Object[] table = (Object[]) tableField.get(map);
int[] counts = new int[table.length];

// get the HashMap.Node#next field
Class<?> entryClass = table.getClass().getComponentType();
Field nextField = entryClass.getDeclaredField("next");
nextField.setAccessible(true);

for (int i = 0; i < table.length; i++) {
    Object e = table[i];
    int count = 0;
    if (e != null) {
        do {
            count++;
        } while ((e = nextField.get(e)) != null);
    }
    counts[i] = count;
}

Now you have an array of the entry counts for each bucket. 现在,您有一个每个存储桶的条目数。

Client.java Client.java

public class Client{
        public static void main(String[] args) {

            Map<Example, Number> m = new HashMap<>();
            Example e1  = new Example(100);  //point 1
            Example e2  = new Example(200);  //point2
            Example e3  = new Example(300);  //point3
            m.put(e1, 10);
            m.put(e2, 20);
            m.put(e3, 30);
            System.out.println(m);//point4
        }
    }

Example.java Example.java

public class Example {
    int s;
    Example(int s) {
        this.s =s;
    }
    @Override
    public int hashCode() {
        // TODO Auto-generated method stub
        return 5;
    }
}

Now at point 1, point 2 and point 3 in Client.java, we are inserting 3 keys of type Example in hashmap m. 现在在Client.java中的第1点,第2点和第3点,我们在hashmap m中插入3个类型为Example的键。 Since hashcode() is overridden in Example.java, all three keys e1,e2,e3 will return same hashcode and hence same bucket in hashmap. 由于在Example.java中重写了hashcode(),因此所有三个键e1,e2,e3将返回相同的哈希码,因此在hashmap中返回相同的桶。

Now the problem is how to see the distribution of keys. 现在的问题是如何查看密钥的分配。

Approach : 方法:

  1. Insert a debug point at point4 in Client.java. 在Client.java中的point4处插入调试点。
  2. Debug the java application. 调试java应用程序。
  3. Inspect m. 检查m。
  4. Inside m, you will find table array of type HashMap$Node and size 16. 在m中,你会发现HashMap $ Node和size 16类型的表数组。
  5. This is literally the hashtable. 这实际上是哈希表。 Each index contains a linked list of Entry objects that are inserted into hashmap. 每个索引都包含插入到hashmap中的Entry对象的链接列表。 Each non null index has a hash variable that correspond to the hash value returned by the hash() method of Hashmap. 每个非null索引都有一个哈希变量,它对应于Hashmap的hash()方法返回的哈希值。 This hash value is then sent to indexFor() method of HashMap to find out the index of table array , where the Entry object will be inserted. 然后将此哈希值发送到HashMap的indexFor()方法,以找出表数组的索引,其中将插入Entry对象。 (Refer @Rahul's link in comments to question to understand the concept of hash and indexFor). (请参阅评论中的@ Rahul链接以了解hash和indexFor的概念)。
  6. For the case, taken above, if we inspect table, you will find all but one key null. 对于上面的情况,如果我们检查表,你会发现除了一个键之外的所有键null。
  7. We had inserted three keys but we can see only one, ie all three keys have been inserted into the same bucket ie same index of table. 我们插入了三个键,但我们只能看到一个,即所有三个键都插入到同一个桶中,即表的相同索引。
  8. Inspect the table array element(in this case it will be 5), key correspond to e1, while value correspond to 10 (point1) 检查table数组元素(在本例中为5), key对应e1, value对应10(point1)
  9. next variable here points to next node of Linked list ie next Entry object which is (e2, 200) in our case. 这里的next变量指向Linked list的下一个节点,即下一个Entry对象,在我们的例子中是(e2,200)。

So in this way you can inspect the hashmap. 因此,您可以通过这种方式检查hashmap。

Also i would recommend you to go through internal implementation of hashmap to understand HashMap by heart. 另外我建议你仔细阅读hashmap的内部实现来理解HashMap。

Hope it helped.. 希望它有所帮助..

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM