简体   繁体   中英

How is a hashMap in java populated when load factor is more than 1?

I tried to create a HashMap with the following details:-

HashMap<Integer,String> test = new HashMap<Integer,String>();
test.put(1, "Value1");
test.put(2, "Value2");
test.put(3, "Value3");
test.put(4, "Value4");
test.put(5, "Value5");
test.put(6, "Value6");
test.put(7, "Value7");
test.put(8, "Value8");
test.put(9, "Value9");
test.put(10, "Value10");
test.put(11, "Value11");
test.put(12, "Value12");
test.put(13, "Value13");
test.put(14, "Value14");
test.put(15, "Value15");
test.put(16, "Value16");
test.put(17, "Value17");
test.put(18, "Value18");
test.put(19, "Value19");
test.put(20, "Value20");

and I saw that every input was put in a different bucket. Which means a different hash code was calculated for each key. Now, if I modify my code as follows :-

HashMap<Integer,String> test = new HashMap<Integer,String>(16,2.0f);
test.put(1, "Value1");
test.put(2, "Value2");
test.put(3, "Value3");
test.put(4, "Value4");
test.put(5, "Value5");
test.put(6, "Value6");
test.put(7, "Value7");
test.put(8, "Value8");
test.put(9, "Value9");
test.put(10, "Value10");
test.put(11, "Value11");
test.put(12, "Value12");
test.put(13, "Value13");
test.put(14, "Value14");
test.put(15, "Value15");
test.put(16, "Value16");
test.put(17, "Value17");
test.put(18, "Value18");
test.put(19, "Value19");
test.put(20, "Value20");

I find that some of the values which were put in different buckets are now put in a bucket which already contains some values even though their hash value is different. Can anyone please help me understand the same ?

Thanks

So, if you initialize a HashMap without specifying an initial size and a load factor it will get initialized with a size of 16 and a load factor of 0.75. This means, once the HashMap is at least (initial size * load factor) big, so 12 elements big, it will be rehashed, which means, it will grow to about twice the size and all elements will be added anew.

You now set the load factor to 2, which means, now the Map will only get rehashed, when it is filled with at least 32 elements.

What happens now is that elements with the same hash mod bucketcount will be put into the same bucket. Each bucket with more then one element contains a list, where all the elements are put into. Now when you try to lookup one of the elements it first finds the bucket using the hash. Then it has to iterate over the whole list in that bucket to find the hash with the exact match. This is quite costly.

So in the end there is a trade-off: rehashing is pretty expensive, so you should try to avoid it. On the other hand, if you have multiple elements in a bucket, the lookup gets pretty expensive, so you should really try to avoid that as well. So you need a balance between those two. One other way to go is to set the initial size quite high, but that takes up more memory that is not used.

In your second test, the initial capacity is 16 and the load factor is 2. This means the HashMap will use an array of 16 elements to store the entries (ie there are 16 buckets), and this array will be resized only when the number of entries in the Map reaches 32 (16 * 2).

This means that some keys having different hashCodes must be stored in the same bucket, since the number of buckets (16) is smaller than the total number of entries (20 in your case).

The assignment of a key to a bucket is calculated in 3 steps :

  1. First the hashCode method is called.
  2. Then an additional function is applied on the hashCode to reduce the damage that may be caused by bad hashCode implementations.
  3. Finally a modulus operation is applied on the result of the previous step to get a number between 0 and capacity - 1 .

The 3rd step is where keys having different hashCode s may end up in the same bucket.

Lets check it with examples -

i) In first case, load factor is 0.75f and initialCapacity is 16 which means array resize will occur when number of buckets in HashMap reaches 16*0.75 = 12.

Now, every key has different HashCode so that HashCode modulo 16 is unique which causes all first 12 entries to go to different buckets after which resize occur and when new entries are put they also end up in different buckets ( HashCode modulo 32 being unique.)

ii) In second case, load factor is 2.0f which means resize will happen when no. of buckets reaches 16*2 = 32. You keep on putting entries in map and it never resizes (for the 20 entries) making multiple entries collide.

So, in nutshell in first example - HashCode modulo 16 for first 12 entries and HashCode modulo 32 for all entries is unique while in second case it is always HashCode modulo 16 for all entries which is not unique (cannot be as all 20 entries have to be accommodated in 16 buckets)

The javadoc explanation:

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

By default,initial capacity is 16 and load factor is 0.75. So when number of entries goes beyond 12 (16 * 0.75) ,its capacity is increased to 32 and hashtable is rehashed. That is why in your first case, every different element is having its own bucket.

In your second case,only when the number of entries crossess 32(16*2) , hash table will be resized. Even if the elements are having different hash code values, when hashcode%bucketsize is calculated, it may collide. That is the reason you are seeing more than one element in same bucket

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM