简体   繁体   中英

Load factor with chained Hash Set

I was given as an assignment to implement a chained hash set:

The set is backed by an array of Linked-Lists (I'll call it A[] ), and if two different values get the same hash-value k they are added to the list A[k] .

The structure works OK with a bounded Load Factor (in interval [0.25, 0.75] ).

In the instructions they told us to calculate the Load Factor as:

Load Factor = size/capacity

where "size" is the total number of elements currently in the set and "capacity" is the array's length ( A.length ).

I think this definition of "size" isn't appropriate in this case, and should be the number of used lists in A .

For example, if all values are mapped to the same cell, say A[1] , then when rehashing according the the Load Factor we'll make the back array A larger when actually only the first cell is used.

Does anyone see any mistake in my logic here?

Hashes are usually mod'ed to be converted to array indices, thus, when increasing the size of the array, it's quite likely that elements won't end up in the same linked-list again (at least they shouldn't if you use a proper hash function).

Also, the meaning of load factor will change rather significantly. As it is defined, it would give some indication of the average number of items in a linked-list, which is a very important number, because this is how long it will take (on average) to retrieve an item.

For better or worse, hash-tables count on a decent distribution of hashes, so it's assumed that one list wouldn't get too large in proportion to the others.

It can also make sense to store the number of indices used to indicate the quality of the hash function, but I don't think there's much point. Not much an API can do about this (since it doesn't handle the hash function, it just calls it). And dynamic changing of hash functions in the calling code if using a bad hash function doesn't seem very practical.

For example, if all values are mapped to the same cell, say A[1], then when rehashing according the the Load Factor we'll make the the back array A larger when actually only the first cell is used.

I think an implicit assumption is that you are using a good hash function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM