简体繁体 English

HashTable/HashMap 是数组吗？

[英]Is HashTable/HashMap an array?

原文 2017-09-29 16:42:52 7 2 java/ hash/ hashmap/ hashtable

I am having confusion in hashing:我在散列时感到困惑：

When we use Hashtable/HashMap (key,value), first I understood the internal data structure is an array (already allocated in memory).当我们使用Hashtable/HashMap(key,value)时，首先我理解了内部数据结构是一个数组（已经分配在内存中）。

Java hashcode() method has an int return type, so I think this hash value will be used as an index for the array and in this case, we should have 2 power 32 entries in the array in RAM, which is not what actually happens. Java hashcode() 方法有一个 int 返回类型，所以我认为这个哈希值将用作数组的索引，在这种情况下，我们应该在 RAM 中的数组中有 2 个幂 32 的条目，这不是实际发生的.

So does Java create an index from the hashcode() which is smaller range?那么 Java 是否从范围较小的 hashcode() 创建索引？

Answer:回答：

As the guys pointed out below and from the documentation: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashMap.java正如下面和文档中指出的那样： http : //grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashMap.java

HashMap is an array. HashMap 是一个数组。 The hashcode() is rehashed again but still integer and the index in the array becomes: h & (length-1); hashcode() 再次重新散列，但仍然是整数，数组中的索引变为： h & (length-1); so if the length of the array is 2^n then I think the index takes the first n bit from re-hashed value.所以如果数组的长度是 2^n 那么我认为索引从重新散列的值中取第一个 n 位。

2 个解决方案

Generally the base data structure will indeed be an array.通常，基本数据结构确实是一个数组。

The methods that need to find an entry (or empty gap in the case of adding a new object) will reduce the hash code to something that fits the size of the array (generally by modulo), and use this as an index into that array.需要查找条目（或在添加新对象的情况下为空白）的方法会将哈希码减少到适合数组大小的值（通常通过取模），并将其用作该数组的索引.

Of course this makes the chance of collisions more likely, since many objects could have a hash code that reduces to the same index (possible anyway since multiple objects might have exactly the same hash code, but now much more likely).当然，这使得发生冲突的可能性更大，因为许多对象可能具有减少到相同索引的哈希代码（无论如何都是可能的，因为多个对象可能具有完全相同的哈希代码，但现在更有可能）。 There are different strategies for dealing with this, generally either by using a linked-list-like structure or a mechanism for picking another slot if the first slot that matched was occupied by a non-equal key.有不同的策略来解决这个问题，通常要么使用类似链表的结构，要么使用一种机制来选择另一个插槽，如果匹配的第一个插槽被非等键占用。

Since this adds cost, the more often such collisions happen the slower things become and in the worse case lookup would in fact be O(n) (and slow as O(n) goes, too).由于这会增加成本，因此此类冲突发生得越频繁，事情就会变得越慢，在最坏的情况下，查找实际上将是 O(n)（并且随着 O(n) 的变化而变慢）。

Increasing the size of the internal store will generally improve this though, especially if it is not to a multiple of the previous size (so the operation that reduced the hash code to find an index won't take a bunch of items colliding on the same index and then give them all the same index again).增加内部存储的大小通常会改善这一点，特别是如果它不是以前大小的倍数（因此减少哈希代码以查找索引的操作不会使一堆项目在同一索引，然后再次给它们所有相同的索引）。 Some mechanisms will increase the internal size before absolutely necessary (while there is some empty space remaining) in certain cases (certain percentage, certain number of collisions with objects that don't have the same full hash code, etc.)在某些情况下（某些百分比，与不具有相同完整哈希码的对象发生一定数量的冲突等），某些机制会在绝对必要之前增加内部大小（同时还有一些空白空间）

This means that unless the hash codes are very bad (most obviously, if they are in fact all exactly the same), the order of operation stays at O(1).这意味着除非哈希码非常糟糕（最明显的是，如果它们实际上完全相同），操作顺序保持在 O(1)。

The structure for a Java HashMap is not just an array. Java HashMap的结构不仅仅是一个数组。 It is an array, but not of 2^31 entries ( int is a signed type!), but of some smaller number of buckets, by default 16 initially.它是一个数组，但不是2^31个条目（ int是有符号类型！），而是一些较小数量的桶，默认情况下最初为16 。 The Javadocs for HashMap explain that. HashMap的 Javadocs 解释了这一点。

When the number of entries exceeds a certain fraction (the "load factor) of the capacity, the array grows to a larger size.当条目数超过容量的某个部分（“负载因子”）时，数组会增长到更大的大小。

Each element of the array does not hold only one entry.数组的每个元素不只包含一个条目。 Each element of the array holds a structure (currently a red-black tree, formerly a list) of entries.数组的每个元素都包含一个条目结构（当前是红黑树，以前是列表）。 Each entry of the structure has a hash code that transforms internally to the same bucket position in the array.该结构的每个条目都有一个哈希代码，该代码在内部转换为数组中相同的桶位置。

Have you read the docs on this type?你读过关于这种类型的文档吗？ http://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html http://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html

You really should.你真的应该。