简体   繁体   English

如何为哈希表实现擦除功能?

[英]How do I implement an erase function for a hash table?

I have a hash table using linear probing. 我有一个使用线性探测的哈希表。 I've been given the task to write an erase(int key) function with the following guidelines. 我已经得到了按照以下准则编写一个erase(int key)函数的任务。

  void erase(int key); Preconditions: key >= 0 Postconditions: If a record with the specified key exists in the table, then that record has been removed; otherwise the table is unchanged. 

I was also given some hints to accomplish the task 我也被提示完成任务

  • It is important to realize that the insert function will allow you to add a new entry to the table, or to update an existing entry in the table. 重要的是要意识到,插入功能将使您可以向表中添加新条目,或更新表中的现有条目。

  • For the linear probing version, notice that the code to insert an item has two searches. 对于线性探测版本,请注意,插入项目的代码有两次搜索。 The insert() function calls function findIndex() to search the table to see if the item is already in the table. insert()函数调用函数findIndex()来搜索表以查看该项是否已在表中。 If the item is not in the table, a second search is done to find the position in the table to insert the item. 如果该项目不在表格中,则进行第二次搜索以找到表格中要插入该项目的位置。 Adding the ability to delete an entry will require that the insertion process be modified. 添加删除条目的功能将需要修改插入过程。 When searching for an existing item, be sure that the search does not stop when it comes to a location that was occupied but is now empty because the item was deleted. 当搜索现有项目时,请确保当涉及到已被占用但由于已删除该项目而现在为空的位置时,搜索不会停止。 When searching for a position to insert a new item, use the first empty position - it does not matter if the position has ever been occupied or not. 搜索要插入新项目的职位时,请使用第一个空职位-职位是否曾被占用都没有关系。

So I've started writing erase(key) and I seem to have run into the problem that the hints are referring to, but I'm not positive what it means. 因此,我已经开始编写“ ease(key)”,并且似乎遇到了提示所指的问题,但是我并不肯定这是什么意思。 I'll provide code in a second, but what I've done to test my code is set up the hash table so that it will have a collision and then I erase that key and rehash the table but it doesn't go into the correct location. 我将在稍后提供代码,但是我为测试代码所做的工作是设置哈希表,以使其具有冲突,然后擦除该键并重新哈希表,但该表不会正确的位置。

For instance, I add a few elements into my hash table: 例如,我在哈希表中添加了一些元素:

The hash table is:
Index  Key    Data
    0   31     3100
    1    1     100
    2    2     200
    3   -1
    4   -1
    5   -1
    6   -1
    7   -1
    8   -1
    9   -1
   10   -1
   11   -1
   12   -1
   13   -1
   14   -1
   15   -1
   16   -1
   17   -1
   18   -1
   19   -1
   20   -1
   21   -1
   22   -1
   23   -1
   24   -1
   25   -1
   26   -1
   27   -1
   28   -1
   29   -1
   30   -1

So all of my values are empty except the first 3 indices. 因此,除了前三个索引外,我所有的值都是空的。 Obviously key 31 should be going into index 1. But since key 1 is already there, it collides and settles for index 0. I then erase key 1 and rehash the table but key 31 stays at index 0. 显然,键31应该进入索引1。但是由于键1已经存在,因此它会冲突并为索引0稳定。然后我擦除键1并重新哈希表,但键31保持在索引0。

Here are the functions that may be worth looking at: 以下是可能值得一看的功能:

void Table::insert( const RecordType& entry )
{
   bool alreadyThere;
   int index;

   assert( entry.key >= 0 );

   findIndex( entry.key, alreadyThere, index );
   if( alreadyThere )
      table[index] = entry;   
   else
   {
      assert( size( ) < CAPACITY );
      index = hash( entry.key );
      while ( table[index].key != -1 )
         index = ( index + 1 ) % CAPACITY;
      table[index] = entry;
      used++;
   }
}

Since insert uses findIndex, I'll include that as well 由于insert使用findIndex,因此我也将其包括在内

void Table::findIndex( int key, bool& found, int& i ) const
{
   int count = 0;

   assert( key >=0 );

   i = hash( key );
   while ( count < CAPACITY && table[i].key != -1 && table[i].key != key )
   {
      count++;
      i = (i + 1) % CAPACITY;
   }   
   found = table[i].key == key;
}

And here is my current start on erase 这是我当前的擦除开始

void Table::erase(int key) 
{
    assert(key >= 0);

    bool found, rehashFound;
    int index, rehashIndex;

    //check if key is in table
    findIndex(key, found, index);

    //if key is found, remove it
    if(found)
    {
        //remove key at position
        table[index].key = -1;
        table[index].data = NULL;
        cout << "Found key and removed it" << endl;
        //reduce the number of used keys
        used--;
        //rehash the table

        for(int i = 0; i < CAPACITY; i++)
        {
            if(table[i].key != -1)
            {
                cout << "Rehashing key : " << table[i].key << endl;
                findIndex(table[i].key, rehashFound, rehashIndex);
                cout << "Rehashed to index : " << rehashIndex << endl;
                table[rehashIndex].key = table[i].key;
                table[rehashIndex].data = table[i].data;
            }
        }
    }
}

Can someone explain what I need to do to make it rehash properly? 有人可以解释我需要做些什么才能使其正确地重新哈希吗? I understand the concept of a hash table, but I seem to be doing something wrong here. 我了解哈希表的概念,但在这里似乎做错了什么。

EDIT 编辑

As per user's suggestion: 根据用户的建议:

void Table::erase(int key)
{
    assert(key >= 0);
    bool found;
    int index;

    findIndex(key, found, index);

    if(found) 
    {
        table[index].key = -2;
        table[index].data = NULL;
        used--;

    }

}


//modify insert(const RecordType & entry)

while(table[index].key != -1 || table[index].key != -2)


//modify findIndex

while(count < CAPACITY && table[i].key != -1
      && table[i].key != -2 && table[i].key != key)

When deleting an item from the table, don't move anything around. 从表格中删除项目时,请勿四处移动。 Just stick a "deleted" marker there. 只需在此处粘贴“已删除”标记即可。 On an insert, treat deletion markers as empty and available for new items. 在插入内容上,将删除标记视为空并且可用于新项目。 On a lookup, treat them as occupied and keep probing if you hit one. 进行查找时,将其视为已占用并在命中时继续进行探测。 When resizing the table, ignore the markers. 调整表格大小时,请忽略标记。

Note that this can cause problems if the table is never resized. 请注意,如果从不调整表的大小,这可能会导致问题。 If the table is never resized, after a while, your table will have no entries marked as never used, and lookup performance will go to hell. 如果该表从未调整过大小,则过一会儿,您的表将没有标记为从未使用过的条目,并且查询性能将变得井井有条。 Since the hints mention keeping track of whether an empty position was ever used and treating once-used cells differently from never-used, I believe this is the intended solution. 由于这些提示提到要跟踪是否曾经使用过空位置,并且将曾经使用过的单元格与从未使用过的单元格区别对待,因此我认为这是预期的解决方案。 Presumably, resizing the table will be a later assignment. 大概,调整表的大小将是以后的任务。

It's not necessary to rehash the entire table every time a delete is done. 不必在每次删除后重新哈希整个表。 If you want to minimise degradation in performance, then you can compact the table by considering whether any of the elements after (with wrapping from end to front allowed) the deleted element but before the next -1 hash to a bucket at or before the deleted element - if so, then they can be moved to or at least closer to their hash bucket, then you can repeat the compaction process for the just-moved element. 如果要最大程度地降低性能降级,则可以通过考虑是否在删除的元素之后(允许从头到尾包装),但在删除之前或之后的下一个-1哈希到存储桶之前的任何元素来压缩表元素-如果是这样,则可以将其移至或至少靠近其哈希存储桶,然后可以对刚刚移动的元素重复压缩过程。

Doing this kind of compaction will remove the biggest flaw in your current code, which is that after a little use every bucket will be marked as either in use or having been used, and performance for eg find of a non-existent value will degrade to O(CAPACITY). 进行这种压缩将消除当前代码中的最大缺陷,即在少量使用后,每个存储桶都将被标记为正在使用或已被使用,例如查找不存在的值的性能将降级为O(容量)。

Off the top of my head with no compiler/testing... 没有编译器/测试的烦恼...

int Table::next(int index) const
{
    return (index + 1) % CAPACITY;
}

int Table::distance(int from, int to) const
{
    return from < to ? to - from : to + CAPACITY - from;
}

void Table::erase(int key)
{
    assert(key >= 0);
    bool found;
    int index;

    findIndex(key, found, index);

    if (found) 
    {
        // compaction...
        int limit = CAPACITY - 1;
        for (int compact_from = next(index);
             limit-- && table[compact_from].key >= 0;
             compact_from = next(compact_from))
        {
            int ideal = hash(table[compact_from].key);
            if (distance(ideal, index) <
                distance(ideal, compact_from))
            {
                table[index] = table[compact_from];
                index = compact_from;
            }
        }

        // deletion
        table[index].key = -1;
        delete table[index].data; // or your = NULL if not a leak? ;-.
        --used;
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM