简体   繁体   English

散列中的单独链接

[英]separate chaining in hashing

I am reading about hashing in Robert Sedwick book on Algorithms in C++ 我正在阅读Robert Sedwick的《 C ++算法》一书中有关散列的内容

We might be using a header node to streamline the code for insertion into an ordered list, but we might not want to use M header nodes for individual lists in separate chaining. 我们可能正在使用头节点来简化插入到有序列表中的代码,但是我们可能不想在单独的链接中为单个列表使用M个头节点。 Indeed, we could even eliminate the M links to the lists by having the first nodes in the lists comprise the table 确实,我们甚至可以通过使列表中的第一个节点包含表来消除到列表的M个链接

.

class ST
{
    struct node
    {
        Item item;
        node* next;

        node(Item x, node* t)
            { item = x; next = t; }
     };

     typedef node *link;

  private:
     link* heads;
     int N, M;

     Item searchR(link t, Key v)
     {
          if (t == 0) return nullItem;
          if (t->item.key() == v) return t->item;
          return searchR(t->next, v);
     }

   public:
     ST(int maxN)
     {
          N = 0; M = maxN/5;
          heads = new link[M];
          for (int i = 0; i < M; i++) heads[i] = 0;
     }

     Item search(Key v)
         { return searchR(heads[hash(v, M)], v); }

     void insert(Item item)
         { int i = hash(item.key(), M);
           heads[i] = new node(item, heads[i]); N++; }
};

My two questions on above text what does author mean by 我在上面的两个问题上,作者的意思是什么

  1. "We could even eliminate the M links to the lists by having the first nodes in the lists comprise the table." “我们甚至可以通过使列表中的第一个节点包含表来消除到列表的M个链接。” How can we modify above code for this? 我们如何为此修改上面的代码?

  2. "we might not want to use M header nodes for individual lists in separate chaining." “我们可能不想在单独的链接中将M个标头节点用于单个列表。” What does this statement mean. 这句话是什么意思。

"We could even eliminate the M links to the lists by having the first nodes in the lists comprise the table." “我们甚至可以通过使列表中的第一个节点包含表来消除到列表的M个链接。”

Consider Node* x[n] vs Node x[n] : the former needs an extra pointer and on-insertion memory allocated for the head Node of every non-empty element, and an extra indirection for every hash table operation, while the latter eliminates the n pointers but requires that any unused elements will be able to be put in some discernable not-in-use state (tracking of which may or may not require extra memory), and if sizeof(Node) size is greater than sizeof(Node*) , it may be more wasteful of memory anyway. 考虑Node* x[n] vs Node x[n] :前者需要为每个非空元素的头Node分配一个额外的指针和插入内存,并为每个哈希表操作分配一个额外的间接寻址,而后者则需要消除了n指针,但是要求任何未使用的元素都将能够处于某种可辨别的不使用状态(对其进行跟踪可能需要也可能不需要额外的内存),并且如果sizeof(Node)大小大于sizeof(Node*) ,反正可能会浪费更多的内存。 The difference in memory use can also affect efficiency of cache use: if the table has a high element to buckets ratio then a Node[] gets the Node data into fewer contiguous memory pages, and if you're iterating (in unsorted order) then it's very cache efficient, whereas Node*[] will jump to separate memory allocations that might be all over the place (or on the other hand, might actually be quite close together in some actually useful: eg if both access patterns and dynamic memory allocation addresses correlate to chronological time of object creation. 内存使用的差异也会影响缓存的使用效率:如果表的元素与存储桶比率高,则Node[]会将Node数据放入较少的连续内存页中,并且如果要进行迭代(以未排序的顺序),则这是非常高效的缓存,而Node*[]将跳转到单独的内存分配中,这些内存分配可能遍布整个地方(或者,另一方面,在某些实际有用的情况中,它们实际上可能非常接近:例如,如果访问模式和动态内存分配地址与对象创建的时间相关。

How can we modify above code for this? 我们如何为此修改上面的代码?

First, your existing code has a problem: heads[i] = new node(item, heads[i]); 首先,您现有的代码有问题: heads[i] = new node(item, heads[i]); overwrites an entry in the hash table without first checking if it's empty... if there's anything there then you should be adding to the list, not overwriting the array. 覆盖哈希表中的条目,而无需先检查它是否为空...如果有任何内容,则应将其添加到列表中,而不是覆盖数组。

The design change discussed needs: 设计变更讨论了需求:

link* heads;

...changed to... ...变成...

node* head;

You'd initialise it like this: 您可以这样初始化它:

head = new node[M];

Which needs an extra node constructor (if item has an equivalent default constructor, you can leave out its initialisation below) 这需要一个额外的node构造函数(如果item具有等效的默认构造函数,则可以在下面省略其初始化)

node() : item(nullItem), next(nullptr) { }

Then there's some knock on changes to the rest of your code that are easy to work through. 然后,对其余代码的更改产生了一些影响,这些更改很容易实现。 Basically, you're getting rid of a layer of pointers. 基本上,您摆脱了一层指针。

"we might not want to use M header nodes for individual lists in separate chaining." “我们可能不想在单独的链接中将M个标头节点用于单个列表。” What does this statement mean. 这句话是什么意思。

I didn't write it so can't say authoritatively, but it appears to be saying that when designing the list code, a decision might have been made to have an initial Node even in an empty list, as this simplifies code for several list operations. 我没有写它,所以不能说权威,但是似乎是在说,在设计列表代码时,即使在一个空列表中,也可能已经决定有一个初始Node,因为这简化了多个列表的代码操作。 While the extra data-less Node might seem a reasonable price when contemplating "usual" uses of a list, hash tables are unusual in that you want most of the lists chained of the buckets to have 0 or 1 element, and exponentially fewer should be longer and longer. 尽管考虑列表的“常规”使用时,多余的无数据节点看起来似乎是合理的价格,但哈希表是不寻常的,因为您希望存储桶链中的大多数列表具有0或1个元素,并且应成倍减少越来越长。 So, such a list implementation is poorly suited to use in a hash table. 因此,这样的列表实现不适合在哈希表中使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM