简体   繁体   中英

Relation between the load factor and time complexity in hash tables?

Regarding hash tables, we measure the performance of the hash table using load factor. But I need to understand the relationship between the load factor and the time complexity of hash table. According to my understanding, the relation is directly proportional. Meaning that, we just take O(1) for the computation of the hash function to find the index. If the load factor is low, this means that no enough elements are there in the table and therefore the chance of finding the key-value pair at their right index is high and therefore the searching operation is minimal and still the complexity is a constant. On the other hand, when the load factor is high the chance of finding the key-value pair into their exact position is low and therefore we will need to do some search operations and therefore the complexity will rise to be in O(n). The same can be said for the insert operation. Is this right?

This is a great question, and the answer is "it depends on what kind of hash table you're using."

A chained hash table where, to store an item, you hash it into a bucket, then store the item in that bucket. If multiple items end up in the same bucket, you simply store a list of all the items that end up in that bucket within the bucket itself. (This is the most commonly-taught version of a hash table.) In this kind of hash table, the expected number of elements in a bucket, assuming a good hash function, is O(α), where the load factor is denoted by α. That makes intuitive sense, since if you distribute your items randomly across the buckets you'd expect that roughly α of them end up in each bucket. In this case, as the load factor increases, you will have to do more and more work on average to find an element, since more elements will be in each bucket. The runtime of a lookup won't necessarily reach O(n), though, since you will still have the items distributed across the buckets even if there aren't nearly enough buckets to go around.

A linear probing hash table works by having an array of slots. Whenever you hash an element, you go to its slot, then walk forward in the table until you either find the element or find a free slot. In that case, as the load factor approaches one, more and more table slots will be filled in, and indeed you'll find yourself in a situation where searches do indeed take time O(n) in the worst case because there will only be a few free slots to stop your search. (There's a beautiful and famous analysis by Don Knuth showing that, assuming the hash function behaves like a randomly-chosen function, the cost of an unsuccessful lookup or insertion into the hash table will take time O(1 / (1 - α) 2 ). It's interesting to plot this function and see how the runtime grows as α gets closer and closer to one.)

Hope this helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM