简体   繁体   中英

Query about internal implementation of HashMap

I am going through HashMap implementation and referring to this link: How does Java implement hash tables? I am finding that "A HashMap contains an array of buckets in order to contain its entries". So, I have few questions-

  1. What is the type of the array of buckets.
  2. Since array has drawbacks (eg fixed size and allowed only homogeneous data).Then why we are using arrays despite of these drawbacks.

3.In case of same hashcode for a key or collision it uses linked list.How it gets(search) the reference of the second, third node etc.

Thanks in adv.

  1. What is the type of the array of buckets.

That depends on the map you make, if you make a HashMap<Integer, String> then the buckets will be of those types, able to contain those types of objects

  1. Since array has drawbacks (eg fixed size and allowed only homogeneous data).Then why we are using arrays despite of these drawbacks.

Because the drawbacks are worth it compared to the performance gain. Because arrays are a fixed size, a lot of checks can be skipped (ie does this index exist?). You can read more about that here; https://en.wikiversity.org/wiki/Java_Collections_Overview and Why not always use ArrayLists in Java, instead of plain ol' arrays?

  1. In case of same hashcode for a key or collision it uses linked list.How it gets(search) the reference of the second, third node etc.

That is explained here better than I can; What happens when a duplicate key is put into a HashMap?

  1. It's an internal Object which contains the key, value and a reference to the next node in the bucket (to realize a single linked list)
  2. A fixed size of a power of 2 is needed for the array. The index of the array for a given key is based on a logical AND (&) of the hashcode of the key and the size of the array which is the actual "magic" of a hash table.
  3. The linked list in a bucket is needed to deal with hashcode collisions. This is the reason for the worst case complexity of O(n) of HashMap.get() - happens if all keys have the same hashcode and the searched key is the last one in the bucket.

If the hashmaps grows there is a very expensive rehash function, because the array has to grow to the next power of 2 aswell. In this case every bucket has to recalulate its index. In this case a new array is constructed. This means there is no dynamic data structure needed.

You can avoid rehashes if you create a new hashmap with a suitable capacity argument.

From the OpenJDK8 code source :

  1. The bins are either Lists or Trees, depending on the amount of elements they hold
  2. The homogeneity of arrays isn't a problem in this context, and access speed primes over the cost of resizing the array
  3. The HashMap always iterate over all the values with a same hash, testing whether they have the correct key :
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM