简体繁体中英

why hashmap resize based on total size instead of filled buckets

原文 2017-11-08 10:13:51 9 3 java/ hashmap

I have a doubt to my mind ::

Currently HashMap in java, resizes when totalSize(no of elements inserted) > arrayLength * loadFactor

so it doubles the table and rehashes all key-value.

But suppose hashcode in Key class is hardcoded to let's say 1, so everytime elements will be inserted at index 1 in linked list manner. But our bucketarray will unnecessary resize on total size. So it will keep on increasing bucketarray size while elements going in the same bucket with such hashcode implementation.

I have a question, should we not check resize on filled buckets, instead of total size ??

I know such hashcode will hamper the performance. I am asking this as a logical question.

3 answers

HashMap has some code that attempts to improve bad hashCode() implementations, but it can do nothing to improve a terrible hashCode() implementation that always returns the same value.

Such a hashCode() will give bad performance regardless of whether or not you resize the HashMap . Therefore, such bad usage of HashMap doesn't justify adding special logic as you suggest.

The assumption on the hashCode() implementation of the key is that it will distribute the keys as close to uniformly as possible among the HashMap bins. Therefore the average number of entries in a bucket (which is the total number of entries divided by the number of buckets) should give a good estimate on when the HashMap should be resized, and the size of individual buckets doesn't need to be checked.

Imagine a hash map with size 12, and 9 items in it. Let's say that by coincidence, #hashCode() only returns multiples of 3 - it's still a flawed hash code, but it's not a contrived edge case such as a constant hash code of 1. This means that in this case, only four buckets (0, 3, 6, and 9) would be filled with 1 or two elements.

With your approach, this hashmap would never get resized, the collision lists would grow forever, and performance would suffer. However, if you resize it based on total size, with a load factor of 75% that would be when adding the tenth element, you'd end up with a map with 24 buckets, 8 of which will be filled.

Growing based on total size keeps the collision lists at a reasonable size with realistic imperfect hash function, because it's reasonable to expect every hash function at least makes a best attempt to distribute hash codes. Meaning that growing a hash map will lead to more full buckets than previously, even if there might still be clusters and empty buckets.

Basically, your suggestion is to optimize for memory use in an edge case, and to not optimize for access performance - ie the main purpose of maps - in more likely cases.

if hashcode always returns same value

it is a bad implementation, no logic of supporting what should not be done.
hashcode may not be a constant function, HashMap has no way to know if the hash function is of type constant or not so it is wise to resize the hashmap as in case suddenly hashcode becomes non constant function then resizing may result in better distribution of the values.

Hashmap loadfactor - based on number of buckets occupied or number of entries in all buckets?

In HashMap why threshold value (The next size value at which to resize) is capacity * load factor. Why not as equal to size or capacity of map

Why my int[][] is filled with 1 instead of 0?

Why iteration through buckets in LinkedHashMap is faster than HashMap?

how are buckets created in hashmap?

Why use EnumMap instead of HashMap

Why HashMap resize In case of collision or worst case

Does HashMap.clear() resize inner hash table to the original size?

Sorting based on size of ArrayList<String> in a HashMap

Sort HashMap keys based on the key size

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Hashmap loadfactor - based on number of buckets occupied or number of entries in all buckets? In HashMap why threshold value (The next size value at which to resize) is capacity * load factor. Why not as equal to size or capacity of map Why my int[][] is filled with 1 instead of 0? Why iteration through buckets in LinkedHashMap is faster than HashMap? how are buckets created in hashmap? Why use EnumMap instead of HashMap Why HashMap resize In case of collision or worst case Does HashMap.clear() resize inner hash table to the original size? Sorting based on size of ArrayList<String> in a HashMap Sort HashMap keys based on the key size

Related Tags

why hashmap resize based on total size instead of filled buckets

Question

3 answers

solution1
3 2017-11-08 10:18:26

solution2
1 2017-11-08 11:14:27

solution3
0 2017-11-08 10:32:27

why hashmap resize based on total size instead of filled buckets

Question

3 answers

solution1 3 2017-11-08 10:18:26

solution2 1 2017-11-08 11:14:27

solution3 0 2017-11-08 10:32:27

solution1
3 2017-11-08 10:18:26

solution2
1 2017-11-08 11:14:27

solution3
0 2017-11-08 10:32:27