简体繁体 English

带链接的哈希表（表加倍）

[英]Hash Table with Chaining (Table Doubling)

原文 2015-10-02 20:30:47 7 2 algorithm/ data-structures/ hashtable/ chaining

How to fix the Hash Table with Chaining when all the Items hash into same slot (One giant LinkedList)? 当所有Items散列到同一个槽（一个巨大的LinkedList）时，如何使用Chaining修复哈希表？
Does Hash Table with Chaining uses Table Doubling? 带链接的哈希表是否使用表加倍？ If so when is the good time to Double the size of the Table. 如果是这样的话什么时候可以将表格的大小加倍。

2 个解决方案

Expanding on the answer from NikiC's comments: 扩展NikiC的评论答案：

For your first question, this is unfortunately a real possibility when implementing a chained hash table. 对于您的第一个问题，遗憾的是，在实现链式哈希表时，这是一个真正的可能性。 Assuming that you have a good hash function - or, better yet, by choosing a hash function that involves some element of randomness - this is extremely unlikely. 假设你有一个很好的哈希函数 - 或者更好的是，通过选择涉及一些随机元素的哈希函数 - 这是极不可能的。 Unfortunately, Bad People sometimes use this to take down web servers. 不幸的是，Bad People有时会使用它来删除Web服务器。 Not too long ago, an attack called a "Hash DoS" was developed whereby someone would craft a bunch of specialized requests to a web server that would cause everything to get stored in the same slot in a chained hash table, which led to huge performance drops and eventually took some websites offline. 不久前，开发了一种名为“Hash DoS”的攻击，有人会对Web服务器制作一堆专门的请求，这些请求会导致所有内容存储在链式哈希表的同一个插槽中，从而带来巨大的性能滴，并最终使一些网站脱机。 The good news, though, is that many programming language implementations have been updated so that their hash tables aren't vulnerable to things like this. 然而，好消息是许多编程语言实现已经更新，因此他们的哈希表不容易受到这样的攻击。

For your second question, the answer is "it depends." 对于你的第二个问题，答案是“它取决于”。 Most good implementations of chained hash tables do rehash and grow when the load factor gets too high (usually, load factors between 1 and 2 are common). 当负载因子过高时，链接哈希表的大多数良好实现会重新散列并增长（通常，1和2之间的负载因子很常见）。 Some implementations do not, though. 但是，有些实现没有。 For example, the ConcurrentHashMap implementation in Java, I believe, does not do any rehashing because doing so is not feasible when many reads and writes are executing concurrently. 例如，我相信Java中的ConcurrentHashMap实现不会进行任何重复操作，因为当许多读写同时执行时，这样做是不可行的。

In each position of the hash table, add a secondary data structure, such as a binary tree or another hash table. 在哈希表的每个位置，添加辅助数据结构，例如二叉树或其他哈希表。

For example, if multiple values get hashed to the same position in the first hash table, then placing them into a binary tree at that position will let you search/insert among them in O(lg n) time as opposed to O(n) time with a linked list. 例如，如果多个值被散列到第一个哈希表中的相同位置，那么将它们放在该位置的二叉树中将允许您在O（lg n）时间内搜索/插入它们而不是O（n）时间与链表。

If you want to use another hash table, then you need to apply a different hash function to the values that land at the same position in the first hash table. 如果要使用另一个哈希表，则需要将不同的哈希函数应用于第一个哈希表中相同位置的值。 After this second hashing, the values will land in the second hash table hopefully in different positions. 在第二次散列之后，值将落在第二个散列表中，希望位于不同的位置。