简体   繁体   中英

why hashmap resize based on total size instead of filled buckets

I have a doubt to my mind ::

Currently HashMap in java, resizes when totalSize(no of elements inserted) > arrayLength * loadFactor

so it doubles the table and rehashes all key-value.

But suppose hashcode in Key class is hardcoded to let's say 1, so everytime elements will be inserted at index 1 in linked list manner. But our bucketarray will unnecessary resize on total size. So it will keep on increasing bucketarray size while elements going in the same bucket with such hashcode implementation.

I have a question, should we not check resize on filled buckets, instead of total size ??

I know such hashcode will hamper the performance. I am asking this as a logical question.

HashMap has some code that attempts to improve bad hashCode() implementations, but it can do nothing to improve a terrible hashCode() implementation that always returns the same value.

Such a hashCode() will give bad performance regardless of whether or not you resize the HashMap . Therefore, such bad usage of HashMap doesn't justify adding special logic as you suggest.

The assumption on the hashCode() implementation of the key is that it will distribute the keys as close to uniformly as possible among the HashMap bins. Therefore the average number of entries in a bucket (which is the total number of entries divided by the number of buckets) should give a good estimate on when the HashMap should be resized, and the size of individual buckets doesn't need to be checked.

Imagine a hash map with size 12, and 9 items in it. Let's say that by coincidence, #hashCode() only returns multiples of 3 - it's still a flawed hash code, but it's not a contrived edge case such as a constant hash code of 1. This means that in this case, only four buckets (0, 3, 6, and 9) would be filled with 1 or two elements.

With your approach, this hashmap would never get resized, the collision lists would grow forever, and performance would suffer. However, if you resize it based on total size, with a load factor of 75% that would be when adding the tenth element, you'd end up with a map with 24 buckets, 8 of which will be filled.

Growing based on total size keeps the collision lists at a reasonable size with realistic imperfect hash function, because it's reasonable to expect every hash function at least makes a best attempt to distribute hash codes. Meaning that growing a hash map will lead to more full buckets than previously, even if there might still be clusters and empty buckets.

Basically, your suggestion is to optimize for memory use in an edge case, and to not optimize for access performance - ie the main purpose of maps - in more likely cases.

if hashcode always returns same value

  1. it is a bad implementation, no logic of supporting what should not be done.

  2. hashcode may not be a constant function, HashMap has no way to know if the hash function is of type constant or not so it is wise to resize the hashmap as in case suddenly hashcode becomes non constant function then resizing may result in better distribution of the values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM