简体   繁体   English

Hashtable与双链表?

[英]Hashtable with doubly linked lists?

Introduction to Algorithms (CLRS) states that a hash table using doubly linked lists is able to delete items more quickly than one with singly linked lists. 算法简介 (CLRS)声明使用双向链表的哈希表能够比单链表更快地删除项。 Can anybody tell me what is the advantage of using doubly linked lists instead of single linked list for deletion in Hashtable implementation? 谁能告诉我在Hashtable实现中使用双链表而不是单链表进行删除有什么好处?

The confusion here is due to the notation in CLRS. 这里的混淆是由于CLRS中的符号。 To be consistent with the true question, I use the CLRS notation in this answer. 为了与真正的问题保持一致,我在这个答案中使用了CLRS表示法。

We use the hash table to store key-value pairs. 我们使用哈希表来存储键值对。 The value portion is not mentioned in the CLRS pseudocode, while the key portion is defined as k . CLRS伪代码中未提及值部分,而密钥部分定义为k

In my copy of CLR (I am working off of the first edition here), the routines listed for hashes with chaining are insert, search, and delete (with more verbose names in the book). 在我的CLR副本中(我在这里处理第一版),列出带链接的哈希的例程是插入,搜索和删除(书中有更详细的名称)。 The insert and delete routines take argument x , which is the linked list element associated with key key[x] . insert和delete例程采用参数x这是与key[x]相关联的链表元素 The search routine takes argument k , which is the key portion of a key-value pair. 搜索例程采用参数k ,它是键值对的关键部分。 I believe the confusion is that you have interpreted the delete routine as taking a key, rather than a linked list element. 我相信混淆是你已经将删除例程解释为使用键而不是链表元素。

Since x is a linked list element, having it alone is sufficient to do an O(1) deletion from the linked list in the h(key[x]) slot of the hash table, if it is a doubly-linked list . 由于x是链表元素, 如果它是双链表 ,则单独使用它就足以从哈希表的h(key[x])槽中的链表中进行O(1)删除。 If, however, it is a singly-linked list, having x is not sufficient. 但是,如果它是单链表,则x不够。 In that case, you need to start at the head of the linked list in slot h(key[x]) of the table and traverse the list until you finally hit x to get its predecessor. 在这种情况下,您需要从表的插槽h(key[x])中的链表的头部开始并遍历列表,直到最后点击x以获取其前任。 Only when you have the predecessor of x can the deletion be done, which is why the book states the singly-linked case leads to the same running times for search and delete. 只有当你拥有x的前身时才能删除,这就是为什么本书说明单链接的情况导致搜索和删除的运行时间相同。

Additional Discussion 补充讨论

Although CLRS says that you can do the deletion in O(1) time, assuming a doubly-linked list, it also requires you have x when calling delete. 虽然CLRS说你可以在O(1)时间内进行删除,假设有一个双向链表,它还要求你在调用delete时有x The point is this: they defined the search routine to return an element x . 关键在于:他们定义了搜索例程以返回元素x That search is not constant time for an arbitrary key k . 该搜索不是任意密钥k恒定时间。 Once you get x from the search routine, you avoid incurring the cost of another search in the call to delete when using doubly-linked lists. 一旦从搜索例程中获得x ,就可以避免在使用双向链接列表时在删除调用中产生另一次搜索的成本。

The pseudocode routines are lower level than you would use if presenting a hash table interface to a user. 如果向用户提供哈希表接口,则伪代码例程的级别低于您使用的级别。 For instance, a delete routine that takes a key k as an argument is missing. 例如,缺少以键k作为参数的删除例程。 If that delete is exposed to the user, you would probably just stick to singly-linked lists and have a special version of search to find the x associated with k and its predecessor element all at once. 如果将该删除暴露给用户,您可能只会坚持使用单链接列表并使用特殊版本的搜索来同时查找与k及其前任元素关联的x

I can think of one reason, but this isn't a very good one. 我可以想到一个原因,但这不是一个很好的原因。 Suppose we have a hash table of size 100. Now suppose values A and G are each added to the table. 假设我们有一个大小为100的哈希表。现在假设值A和G都添加到表中。 Maybe A hashes to slot 75. Now suppose G also hashes to 75, and our collision resolution policy is to jump forward by a constant step size of 80. So we try to jump to (75 + 80) % 100 = 55. Now, instead of starting at the front of the list and traversing forward 85, we could start at the current node and traverse backwards 20, which is faster. 也许是哈希到75位。现在假设G也哈希到75,我们的冲突解决策略是以80的恒定步长向前跳。所以我们尝试跳到(75 + 80)%100 = 55.现在,我们可以从当前节点开始并向后遍历20,而不是从列表的前面开始并向前遍历85,这样会更快。 When we get to the node that G is at, we can mark it as a tombstone to delete it. 当我们到达G所在的节点时,我们可以将其标记为删除它的墓碑。

Still, I recommend using arrays when implementing hash tables. 不过,我建议在实现哈希表时使用数组。

Hashtable is often implemented as a vector of lists. Hashtable通常作为列表向量实现。 Where index in vector is the key (hash). 向量中的索引是键(哈希)。
If you don't have more than one value per key and you are not interested in any logic regarding those values a single linked list is enough. 如果每个键没有多个值,并且您对这些值的任何逻辑不感兴趣,则单个链表就足够了。 A more complex/specific design in selecting one of the values may require a double linked list. 选择其中一个值时更复杂/特定的设计可能需要双链表。

Let's design the data structures for a caching proxy. 让我们设计一个缓存代理的数据结构。 We need a map from URLs to content; 我们需要一个从URL到内容的地图; let's use a hash table. 让我们使用哈希表。 We also need a way to find pages to evict; 我们还需要一种方法来查找要逐出的页面; let's use a FIFO queue to track the order in which URLs were last accessed, so that we can implement LRU eviction. 让我们使用FIFO队列来跟踪上次访问URL的顺序,以便我们可以实现LRU驱逐。 In C, the data structure could look something like 在C中,数据结构可能看起来像

struct node {
    struct node *queueprev, *queuenext;
    struct node **hashbucketprev, *hashbucketnext;
    const char *url;
    const void *content;
    size_t contentlength;
};
struct node *queuehead;  /* circular doubly-linked list */
struct node **hashbucket;

One subtlety: to avoid a special case and wasting space in the hash buckets, x->hashbucketprev points to the pointer that points to x . 一个微妙之处:为了避免特殊情况并在散列桶中浪费空间, x->hashbucketprev指向指向x的指针。 If x is first in the bucket, it points into hashbucket ; 如果x是桶中的第一个,它指向hashbucket ; otherwise, it points into another node. 否则,它指向另一个节点。 We can remove x from its bucket with 我们可以用它从桶中删除x

x->hashbucketnext->hashbucketprev = x->hashbucketprev;
*(x->hashbucketprev) = x->hashbucketnext;

When evicting, we iterate over the least recently accessed nodes via the queuehead pointer. 在驱逐时,我们通过queuehead指针迭代最近访问的最少节点。 Without hashbucketprev , we would need to hash each node and find its predecessor with a linear search, since we did not reach it via hashbucketnext . 如果没有hashbucketprev ,我们需要散列每个节点并使用线性搜索找到它的前任,因为我们没有通过hashbucketnext到达它。 (Whether that's really bad is debatable, given that the hash should be cheap and the chain should be short. I suspect that the comment you're asking about was basically a throwaway.) (这是否真的很糟糕是值得商榷的,因为哈希应该很便宜而且链条应该很短。我怀疑你所询问的评论基本上是一次性的。)

If the items in your hashtable are stored in "intrusive" lists, they can be aware of the linked list they are a member of. 如果散列表中的项目存储在“侵入式”列表中,则他们可以知道它们所属的链接列表。 Thus, if the intrusive list is also doubly-linked, items can be quickly removed from the table. 因此,如果侵入列表也是双重链接的,则可以快速从表中删除项目。

(Note, though, that the "intrusiveness" can be seen as a violation of abstraction principles...) (请注意,“侵入性”可被视为违反抽象原则......)

An example: in an object-oriented context, an intrusive list might require all items to be derived from a base class. 例如:在面向对象的上下文中,侵入式列表可能需要从基类派生所有项。

class BaseListItem {
  BaseListItem *prev, *next;

  ...

public: // list operations
  insertAfter(BaseListItem*);
  insertBefore(BaseListItem*);
  removeFromList();
};

The performance advantage is that any item can be quickly removed from its doubly-linked list without locating or traversing the rest of the list. 性能优势是任何项目都可以从其双向链接列表中快速删除,而无需定位或遍历列表的其余部分。

Unfortunately my copy of CLRS is in another country right now, so I can't use it as a reference. 不幸的是,我的CLRS副本现在在另一个国家,所以我不能用它作为参考。 However, here's what I think it is saying: 但是,我认为这是在说:

Basically, a doubly linked list supports O(1) deletions because if you know the address of the item, you can just do something like: 基本上,双向链表支持O(1)删除,因为如果您知道项的地址,您可以执行以下操作:

x.left.right = x.right;
x.right.left = x.left;

to delete the object from the linked list, while as in a linked list, even if you have the address, you need to search through the linked list to find its predecessor to do: 从链表中删除对象,而在链表中,即使您有地址,也需要搜索链表以查找其前任:

pred.next = x.next

So, when you delete an item from the hash table, you look it up, which is O(1) due to the properties of hash tables, then delete it in O(1), since you now have the address. 因此,当你从哈希表中删除一个项目时,你会查找它,由于哈希表的属性,它是O(1),然后在O(1)中删除它,因为你现在有了地址。

If this was a singly linked list, you would need to find the predecessor of the object you wish to delete, which would take O(n). 如果这是一个单链表,你需要找到你想要删除的对象的前身,这将需要O(n)。


However: 然而:

I am also slightly confused about this assertion in the case of chained hash tables, because of how lookup works. 由于查找的工作原理,我对链式哈希表的这种断言也略感困惑。 In a chained hash table, if there is a collision, you already need to walk through the linked list of values in order to find the item you want, and thus would need to also find its predecessor. 在链式哈希表中,如果存在冲突,则您需要遍历链接的值列表以查找所需的项目,因此还需要找到其前任。

But, the way the statement is phrased gives clarification: "If the hash table supports deletion, then its linked lists should be doubly linked so that we can delete an item quickly. If the lists were only singly linked, then to delete element x, we would first have to find x in the list T[h(x.key)] so that we could update the next attribute of x's predecessor." 但是,语句的措辞方式给出了澄清:“如果哈希表支持删除,那么它的链表应该双重链接,以便我们可以快速删除项目。如果列表只是单链接,那么要删除元素x,我们首先必须在列表T [h(x.key)]中找到x,以便我们可以更新x的前任的下一个属性。“

This is saying that you already have element x, which means you can delete it in the above manner. 这就是说你已经有元素x,这意味着你可以用上面的方式删除它。 If you were using a singly linked list, even if you had element x already, you would still have to find its predecessor in order to delete it. 如果你使用单链表,即使你已经有元素x,你仍然需要找到它的前身才能删除它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM