I have a doubt to my mind ::
Currently HashMap in java, resizes when totalSize(no of elements inserted) > arrayLength * loadFactor
so it doubles the table and rehashes all key-value.
But suppose hashcode in Key class is hardcoded to let's say 1, so everytime elements will be inserted at index 1 in linked list manner. But our bucketarray
will unnecessary resize on total size. So it will keep on increasing bucketarray
size while elements going in the same bucket with such hashcode implementation.
I have a question, should we not check resize on filled buckets, instead of total size ??
I know such hashcode will hamper the performance. I am asking this as a logical question.
HashMap
has some code that attempts to improve bad hashCode()
implementations, but it can do nothing to improve a terrible hashCode()
implementation that always returns the same value.
Such a hashCode()
will give bad performance regardless of whether or not you resize the HashMap
. Therefore, such bad usage of HashMap
doesn't justify adding special logic as you suggest.
The assumption on the hashCode()
implementation of the key is that it will distribute the keys as close to uniformly as possible among the HashMap
bins. Therefore the average number of entries in a bucket (which is the total number of entries divided by the number of buckets) should give a good estimate on when the HashMap
should be resized, and the size of individual buckets doesn't need to be checked.
Imagine a hash map with size 12, and 9 items in it. Let's say that by coincidence, #hashCode()
only returns multiples of 3 - it's still a flawed hash code, but it's not a contrived edge case such as a constant hash code of 1. This means that in this case, only four buckets (0, 3, 6, and 9) would be filled with 1 or two elements.
With your approach, this hashmap would never get resized, the collision lists would grow forever, and performance would suffer. However, if you resize it based on total size, with a load factor of 75% that would be when adding the tenth element, you'd end up with a map with 24 buckets, 8 of which will be filled.
Growing based on total size keeps the collision lists at a reasonable size with realistic imperfect hash function, because it's reasonable to expect every hash function at least makes a best attempt to distribute hash codes. Meaning that growing a hash map will lead to more full buckets than previously, even if there might still be clusters and empty buckets.
Basically, your suggestion is to optimize for memory use in an edge case, and to not optimize for access performance - ie the main purpose of maps - in more likely cases.
if hashcode
always returns same value
it is a bad implementation, no logic of supporting what should not be done.
hashcode
may not be a constant function, HashMap has no way to know if the hash function is of type constant or not so it is wise to resize the hashmap
as in case suddenly hashcode
becomes non constant function then resizing may result in better distribution of the values.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.