简体繁体 English

Java HashMap数组大小

[英]Java HashMap array size

原文 2017-03-22 01:46:22 4 3 java/ oop/ collections/ java-8/ hashmap

I am reading the implementation details of Java 8 HashMap, can anyone let me know why Java HashMap initial array size is 16 specifically? 我正在阅读Java 8 HashMap的实现细节，谁能让我知道为什么Java HashMap的初始数组大小是16？ What is so special about 16? 16岁有什么特别之处？ And why is it the power of two always? 为什么它总是两个人的力量？ Thanks 谢谢

3 个解决方案

The reason why powers of 2 appear everywhere is because when expressing numbers in binary (as they are in circuits), certain math operations on powers of 2 are simpler and faster to perform (just think about how easy math with powers of 10 are with the decimal system we use). 2的幂出现在任何地方的原因是因为当以二进制表示数字时（因为它们在电路中），对2的幂的某些数学运算更简单且更快地执行（只要想想10的幂的数学运算是多么容易与我们使用的十进制系统）。 For example, multication is not a very efficient process in computers - circuits use a method similar to the one you use when multiplying two numbers each with multiple digits. 例如，多路复用在计算机中不是一个非常有效的过程 - 电路使用的方法类似于将两个数字乘以多个数字时使用的方法。 Multiplying or dividing by a power of 2 requires the computer to just move bits to the left for multiplying or the right for dividing. 乘以或除以2的幂要求计算机只向左移动位用于乘法或向右移动用于除法的位。

And as for why 16 for HashMap? 至于为什么16为HashMap？ 10 is a commonly used default for dynamically growing structures (arbitrarily chosen), and 16 is not far off - but is a power of 2. 10是动态增长结构（任意选择）的常用默认值，16是不远的 - 但是是2的幂。

You can do modulus very efficiently for a power of 2. n % d = n & (d-1) when d is a power of 2, and modulus is used to determine which index an item maps to in the internal array - which means it occurs very often in a Java HashMap. 当d是2的幂时，你可以非常有效地n % d = n & (d-1) 2. n % d = n & (d-1)的幂，并且模数用于确定项目在内部数组中映射到哪个索引 - 这意味着它经常出现在Java HashMap中。 Modulus requires division, which is also much less efficient than using the bitwise and operator. 模数需要除法，这比使用bitwise and运算符的效率低得多。 You can convince yourself of this by reading a book on Digital Logic. 你可以通过阅读有关数字逻辑的书来说服自己。

The reason why bitwise and works this way for powers of two is because every power of 2 is expressed as a single bit set to 1. Let's say that bit is t. 对于2的幂， bitwise and这种方式工作的原因是因为2的每个幂被表示为设置为1的单个位。假设该位是t。 When you subtract 1 from a power of 2, you set every bit below t to 1, and every bit above t (as well as t) to 0. Bitwise and therefore saves the values of all bits below position t from the number n (as expressed above), and sets the rest to 0. 当你从2的幂中减去1时，你将低于t的每一位设置为1，并将高于t（以及t）的每一位设置为0. Bitwise and因此将数字n中的所有位的值保存在位置t之下（如上所述），并将其余部分设为0。

But how does that help us? 但这对我们有什么帮助？ Remember that when dividing by a power of 10, you can count the number of zeroes following the 1, and take that number of digits starting from the least significant of the dividend in order to find the remainder. 请记住，当除以10的幂时，您可以计算1之后的零的数量，并从被除数的最低有效位开始获取该位数，以便找到余数。 Example: 637989 % 1000 = 989. A similar property applies to binary numbers with only one bit set to 1, and the rest set to 0. Example: 100101 % 001000 = 000101 示例：637989％1000 = 989.类似的属性适用于二进制数，只有一位设置为1，其余设置为0.示例：100101％001000 = 000101

There's one more thing about choosing the hash & (n - 1) versus modulo and that is negative hashes. 选择hash & (n - 1)与modulo相比还有一个问题，那就是负哈希。 hashcode is of type int, which of course can be negative. hashcode的类型为int，当然可以为负数。 modulo on a negative number (in Java) is negative also, while & is not. 负数（在Java中）的模数也是负数，而&则不是。

Another reason is that you want all of the slots in the array to be equally likely to be used. 另一个原因是您希望阵列中的所有插槽都可以使用。 Since hash() is evenly distributed over 32 bits, if the array size didn't divide into the hash space, then there would be a remainder causing lower indexes to have a slightly higher chance of being used. 由于hash（）均匀分布在32位上，如果数组大小没有分成哈希空间，那么会有一个余数导致较低的索引具有稍高的使用机会。 Ideally, not just the hash, but (hash() % array_size) is random and evenly distributed. 理想情况下，不仅是散列，而且（hash（）％array_size）是随机且均匀分布的。

But this only really matters for data with a small hash range (like a byte or character). 但这只对具有小散列范围的数据（如字节或字符）非常重要。