简体   繁体   中英

load factor in separate chaining?

Why is it recommended to have a load factor of 1 in Separate Chaining?

I've seen plenty of people saying that it is recommended, but not given a clear explanation of why .

In Open addressing, I know the load factor should be between 0.5 and 0.7 because it should be a fast operation to find an unoccupied index when dealing with collisions. But I can't see why a load factor of 1 should be better in Separate Chaining. I mean, if I have a table of size 100, isn't there still a chance that all 100 elements hashes to the same index and get placed in the same list? God, I really can't comprehend why this specific load factor for Separate Chaining should be 1.

tl;dr: To save memory space by not having slots unoccupied and speed up access by minimizing the number of list traversal operation.

If you understand the load factor as n_used_slots / n_total_slots :

Having a load factor of 1 just describes the ideal situation for a well-implemented hash table using Separate Chaining collision handling: no slots are left empty.

The other classical approach, Open Addressing, requires the table to always have a free slot available when adding a new item. Resizing the table is way too costly to do it for each item, but we are also restricted on memory and wouldn't want to have too many unused slots lying around. One has to find a balance between speed (few table resizes, quick inserts and lookups) and memory (few empty slots) [as ever so often in programming]. The ideal load factor is based on this idea of balancing and can be estimated based on the actual hash function, the value domain and other factors.

With Separate Chaining, on the other hand, we usually expect from the start to have (way) more items than available hash table slots. If a collision occurs, we need to add the item to the linked list stored in a specific slot. Since searching in a linked list is costly, we would like to minimize list traversal operations. For that, the best case is to have all slots filled with lists of ideally the same length! Having all slots filled corresponds to a load factor of 1.

To put it another way: A load factor < 1 means that there are empty slots and items had to be added to a linked list in another slot, increasing the number of list traversal operations and wasting some memory.

Concerning your example of a table with size 100: yes, there is a chance that all items collide and occupy just one single slot. In that case, the effective load factor would be 0.01 and performance would be heavily impacted.

If you understand the load factor as n_items / n_total_slots :

In that case, the load factor can be larger than 1. A factor < 1 means you have empty slots, while factor > 1 means that there are slots holding more than one item and consequently, list traversals are required. In the first case, you are wasting space and in the second case list traversals lead to a (small) performance hit, depending on the size of the lists.

Example: A load factor of 10 means that on average each slot holds 10 items. Searching for an item therefore implies traversing 5 list nodes on average .

A load factor of 1 means you waste no space and have the fastest lookup, if you use a decent hash function that ensures a regular and evenly balanced usage of slots.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM