简体   繁体   English

重新散列时Java HashMap 内部数据结构如何变化?

[英]How the Java HashMap internal data structure changes during rehashing?

I am trying to write demo code to show rehashing is happening in Hashmap when the map size exceeds the load factor threshold.我正在尝试编写演示代码以显示当地图大小超过负载因子阈值时,Hashmap 中正在发生重新散列。 How can I prove rehashing is happening internally .我如何证明重新哈希正在内部发生。 Also I want to prove that eventhough the old entries are moved to new buckets during rehash , I can get the old elements using the old key(Let me know my assumption is correct).此外,我想证明即使在 rehash 期间旧条目被移动到新存储桶,我也可以使用旧密钥获取旧元素(让我知道我的假设是正确的)。 Below the sample code.在示例代码下方。

import java.util.*;

    class RehashDemo{

        public static void main(String[] args){
            Map<Integer,String> numbers = new HashMap<>(10);
            for(int i = 0; i<10;i++){
                numbers.put(i,i+"");
            }
            System.out.println(numbers);

            for(int j = 15; j<=20;j++){
                numbers.put(j,j+"");
            }
            System.out.println(numbers);

        }


    }

It's not difficult to write a program to demonstrate rehashing, but you have to understand a lot about HashMap's internal organization, how objects' hashcodes are generated, how hashcodes are related to HashMap's internal structures, and how this affects iteration order.写一个程序来演示rehashing并不难,但是你必须了解很多关于HashMap的内部组织,对象的hashcode是如何生成的,hashcode如何与HashMap的内部结构相关,以及这如何影响迭代顺序。

Briefly, HashMap consists of an array of buckets (the "table").简而言之,HashMap 由一组桶(“表”)组成。 Each bucket is a linked list of key-value pairs.每个桶都是一个键值对的链表。 Adding a pair whose key hashes to a bucket that's already occupied is added to the end of the linked list for that bucket.将键哈希值添加到已被占用的存储桶的一对添加到该存储桶的链表的末尾。 The bucket is determined by calling the key's hashCode() method, XORing it with the its high order 16 bits right-unsigned-shifted by 16 (see source ), and then taking the modulus of the table size.桶是通过调用键的hashCode()方法确定的,将它的高位 16 位右无符号右移 16 位进行异或运算(参见源代码),然后取表大小的模数。 Since the table size is always a power of two, this is essentially ANDing with a mask of (tablesize-1).由于表大小始终是 2 的幂,这本质上是与 (tablesize-1) 掩码的 AND 运算。 The hash code of an Integer object is simply its integer value. Integer对象的哈希码就是它的整数值。 ( source ). 来源)。 Finally, the iteration order of a HashMap steps through each bucket sequentially, and also sequentially through the linked list of pairs within each bucket.最后,HashMap 的迭代顺序依次遍历每个桶,也顺序遍历每个桶内的对链表。

After all that, you can see that small integer values will end up in corresponding buckets.毕竟,您可以看到小的整数值最终会出现在相应的桶中。 For example, Integer.valueOf(0).hashCode() is 0. It will remain 0 after shift-and-XOR, and modulus any table size will remain 0. Thus, Integer 0 ends up in bucket 0, Integer 1 ends up in bucket 1, and so forth.例如, Integer.valueOf(0).hashCode()为 0。移位和异或后它将保持为 0,并且对任何表大小进行模数将保持为 0。因此,整数 0 以桶 0 结束,整数 1 结束在存储桶 1 中,依此类推。 But don't forget that the bucket is modulo the table size.但不要忘记桶是表大小的模。 So if the table size is 8, Integer 8 will end up in bucket 0.因此,如果表大小为 8,则整数 8 将在桶 0 中结束。

With this information, we can populate a HashMap with Integer keys that will end up in predictable buckets.有了这些信息,我们就可以用整数键填充一个 HashMap,这些键最终会出现在可预测的桶中。 Let's create a HashMap with a table size of 8 and a default load factor of 0.75, meaning that we can add six mappings before rehashing occurs.让我们创建一个表大小为 8 且默认加载因子为 0.75 的 HashMap,这意味着我们可以在重新散列发生之前添加六个映射。

Map<Integer, Integer> map = new HashMap<>(8);
map.put(0, 0);
map.put(8, 8);
map.put(1, 1);
map.put(9, 9);
map.put(2, 2);
map.put(10, 10);

{0=0, 8=8, 1=1, 9=9, 2=2, 10=10}

Printing out the map (essentially, using its toString() method) iterates the map sequentially as described above.打印出地图(本质上,使用其toString()方法)如上所述按顺序迭代地图。 We can see that 0 and 8 end up in the first bucket, 1 and 9 in the second, and 2 and 10 in the third.我们可以看到 0 和 8 出现在第一个桶中,1 和 9 出现在第二个桶中,2 和 10 出现在第三个桶中。 Now let's add another entry:现在让我们添加另一个条目:

map.put(3, 3);

{0=0, 1=1, 2=2, 3=3, 8=8, 9=9, 10=10}

The iteration order changed!迭代顺序改变了! Adding the new mapping exceeded the threshold for rehashing, so the table size was doubled to 16. Rehashing was done, this time with a modulus of 16 instead of 8. Whereas 0 and 8 were both in bucket 0 before, now they're in separate buckets, since there are twice as many buckets available.添加新映射超过了重新散列的阈值,因此表大小增加了一倍至 16。重新散列完成,这次使用 16 而不是 8 的模数。而 0 和 8 之前都在桶 0 中,现在它们在单独的存储桶,因为可用存储桶的数量是原来的两倍。 Same with 1/9 and 2/10.与 1/9 和 2/10 相同。 The second entry in each bucket with the old table size of 8 now hashes to its own bucket when the table size is 16. You can see this, since the iteration proceeds sequentially through the buckets, and there is now one entry in each bucket.当表大小为 16 时,每个桶中旧表大小为 8 的第二个条目现在散列到它自己的桶。您可以看到这一点,因为迭代按顺序通过桶进行,现在每个桶中有一个条目。

Of course, I chose the integer values carefully such that collisions occur with the table size of 8 and do not occur with a table size of 16. That lets us see the rehashing very clearly.当然,我仔细选择了整数值,这样当表大小为 8 时会发生冲突,而表大小为 16 时不会发生冲突。这让我们可以非常清楚地看到重新散列。 With more typical objects, the hash codes (and thus the buckets) are harder to predict, so it's harder to see the collisions and what gets shifted around when rehashing occurs.对于更典型的对象,散列代码(以及桶)更难预测,因此更难看到冲突以及在重新散列发生时发生的变化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM