简体   繁体   中英

number of hash buckets

In the HashMap documentation, it is mentioned that:

  • the initial capacity is simply the capacity at the time the hash table is created
  • the capacity is the number of buckets in the hash table.

Now suppose we have intial capacity of 16 (default), and if we keep adding elements to 100 nos, the capacity of hashmap is 100 * loadfactor.

Will the number of hash buckets is 100 or 16?

Edit:
From the solution I read: buckets are more than the elements added. Taking this as view point: so if we add Strings as key, we will get one element/bucket resulting in a lot of space consumption/complexity, is my understanding right ?

Neither 100 nor 16 buckets. Most likely there will be 256 buckets, but this isn't guaranteed by the documentation.

From the updated documentation link :

The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

(emphasis mine)

So, if we ignore the word "approximately" above, we determine that whenever the hash table becomes 75% full (or whichever load factor you specify in the constructor), the number of hash buckets doubles. That means that the number of buckets doubles whenever you insert the 12th, 24th, 48th, and 96th elements, leaving a total of 256 buckets.

However, as I emphasized in the documentation snippet, the number is approximately twice the previous size, so it may not be exactly 256. In fact, if the second-to-last doubling is replaced with a slightly larger increase, the last doubling may never happen, so the final hash table may be as small as 134 buckets, or may be larger than 256 elements.

NB I arrived at the 134 number because it's the smallest integer N such that 0.75 * N > 100 .

Looking at the source code of HashMap we see the following:

threshold = capacity * loadfactor
size = number of elements in the map

if( size >= threshold ) {
  double capacity
}

Thus, if the initial capacity is 16 and your load factor is 0.75 (the default), the initial threshold will be 12. If you add the 12th element, the capacity rises to 32 with a threshold of 24. The next step would be capacity 64 and threshold 48 etc. etc.

So with 100 elements, you should have a capacity of 256 and a threshold of 192.

Note that this applies only to the standard values. If you know the approximate number of elements your map will contain you should create it with a high enough initial capacity in order to prevent the copying around when the capacity is increased.

Update :

A word on the capacity: it will always be a power of two, even if you define a different initial capacity. The hashmap will then set the capacity to the smallest power of 2 that is greater than or equal to the provided initial capacity.

From your link :

When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the capacity is roughly doubled by calling the rehash method.

That means if we have initial capacity 16 and when it exceeds, capacity will be increased by 32, next time by 64 and so on.

In your case, you are adding 100 nos. So when you come to 16th number, size will be added by 32 so now total size 48. Again you keep adding till 48th number now size will be increased by 64. Thus, in your case, total size of bucket is 112.

You are going to have at least one bucket per actual item. If you add items beyond 16, the table must be resized and rehashed.

Now suppose we have intial capacity of 16 (default), and if we keep adding elements to 100 nos, the capacity of hashmap is 100 * loadfactor.

Actually it says:

If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

Ie, if there is a maximum of 100 items and the capacity is 100/0.75 = 133, then no re-hashing should ever occur. Notice that this implies that even if the table is not full , it may have to be rehashed when close to full. So the ideal initial capacity to set using the default load factor, if you expect <=100 items, is ~135+.

In doc

When the number of entries in the hash table exceeds the product of the 
load factor and the current capacity, the capacity is roughly doubled by 
calling the rehash method.

threshold=product of the  load factor and the current capacity

Lets try.. initial size of hashmap is 16
and default load factor is 0.75 so 1st threshold is 12 so adding 12 element next capacity will be.. (16*2) =32
2st threshold is 24 so after adding 24th element next capacity will be (32*2)=64

and so on..

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM