C - 制作单独的链接 Hash 表 - 问题

Question

我花了一些时间来做这件事，努力放置可理解的变量和东西。 试图让它看起来干净整洁。 这样我就可以轻松调试它。 但我似乎找不到我的问题......终端没有 output 任何东西。 请帮我找出我的错误！

#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct list_node *node_ptr;

struct list_node
{
    node_ptr next;
    char *key;
    char *value;
    
};

typedef node_ptr LIST;
typedef node_ptr position;

struct hash_table
{
    LIST *list_ptr_arr;
    unsigned int table_size;
};

typedef struct hash_table *HASHTABLE;

unsigned long long int
hash(const char *key, unsigned int hash_size)
{

    unsigned long long int hash;

    for(int i = 0; key[i]; i++)
    {
        hash = (hash<<32)+key[i];
    }

    return (hash%hash_size);

}

unsigned int 
next_prime(int number)
{

    int j;

    for(int i = number; ; i++)
    {
        for(j = 2; j<i; j++)
        {
            if(i%j == 0){break;}
        }

        if(i==j){return j;}
    }
}

HASHTABLE
initialize(unsigned int table_size)
{
    HASHTABLE H;

    H = (HASHTABLE) malloc(sizeof(struct hash_table));
    if(H==NULL){printf("Out of Space!"); return 0;}

    H->table_size = next_prime(table_size);

    H->list_ptr_arr = (position*) malloc(sizeof(LIST)*table_size);
    if(H->list_ptr_arr==NULL){printf("Out of Space!"); return 0;}

    H->list_ptr_arr = (LIST*) malloc(sizeof(struct list_node)*table_size);

    for(unsigned int i = 0; i<table_size; i++)
    {
        if(H->list_ptr_arr[i]==NULL){printf("Out of Space!"); return 0;}

        H->list_ptr_arr[i]=NULL;
    }


    return H;
    
}



void
insert(const char *key, const char *value, HASHTABLE H)
{
    unsigned int slot = hash(key, H->table_size);
    node_ptr entry = H->list_ptr_arr[slot];

    node_ptr prev;

    while(entry!=NULL)
    {
        if(strcmp(entry->key, key)==0)
        {
            free(entry->value);
            entry->value = malloc(strlen(value)+1);
            strncpy(entry->value,value,strlen(value));
            return;
        }

        prev = entry;
        entry = prev->next;

    }

    entry = (position) malloc(sizeof(struct list_node));
    entry->value = malloc(strlen(value)+1);
    entry->key = malloc(strlen(key)+1);
    strncpy(entry->key,key,strlen(key));
    strncpy(entry->value,value,strlen(value));
    entry->next = NULL;
    prev->next = entry;

}

void
dump(HASHTABLE H)
{

    for(unsigned int i = 0; i<H->table_size; i++)
    {
        position entry = H->list_ptr_arr[i];

        if(H->list_ptr_arr[i]==NULL){continue;}

        printf("slot[%d]: ", i);

        for(;;)
        {
            printf("%s|%s -> ", entry->key, entry->value);

            if(entry->next == NULL)
            {
                printf("NULL");
                break;
            }

            entry = entry->next;
        }

        printf("\n");

    }

}


int main()
{
  
    HASHTABLE H = initialize(10);
    insert("name1", "David", H);
    insert("name2", "Lara", H);
    insert("name3", "Slavka", H);
    insert("name4", "Ivo", H);
    insert("name5", "Radka", H);
    insert("name6", "Kvetka", H);
    dump(H);
  
    return 0;   
    
}

试图修改它并改变一些东西，但没有任何帮助......

提前谢谢你们！

Answer 1

有一些美观问题和至少两个破坏代码的错误。 我不会 go 进入小事，它主要是风格，但你的initialize()和insert()函数不起作用。

在initialize()中，您为H->list_ptr_array分配了两次 memory。 这会无缘无故地从第一次分配中泄漏 memory，但是当然，这不会使您的代码崩溃，只是泄漏。 在第二次分配中，您分配了错误的大小，您使用了sizeof(struct list_node) * tale_size ，但您想要一个指针数组而不是结构（由于结构包含指针，结构会更大）。 同样，这只会浪费 memory 并且不会使其崩溃。 不过，使用正确的 memory 会更好，您可以使用它

H->list_ptr_arr = malloc(table_size * sizeof *H->list_ptr_arr);

您不需要转换malloc()的结果，它是一个void *并且您不需要将其转换为指针类型，但这是一个风格问题。 该行的重要部分是我们可以从我们分配给的变量中获取基础数据的大小，这将始终保证我们获得正确的大小，即使我们在某些时候更改了类型。 我也倾向于不时使用sizeof(type) ，但sizeof *ptr是更好的模式，值得习惯。

无论如何，尽管您分配了错误数量的 memory，但您分配了足够的数量，因此您的程序不会因此而崩溃。 但是，当您随后遍历表中分配的 bin 时，如果它们是NULL ，则会返回错误。 它们根本没有初始化，所以如果它们是NULL （它们可能是），那么这纯属运气。 或者，如果您认为这是错误的迹象，那就不幸了。 但是，如果您认为NULL是此处分配错误的信号，那么为什么在您断定它们不是之后立即将每个 bin 初始化为NULL呢？

实际上，如果您碰巧在数组中获得了NULL指针，您的初始化将中止，并且由于您没有检查main()中的分配错误（这对于测试来说很好），这可能是您的程序的原因正在崩溃。 这不是主要问题，只有当您偶然在其中一个垃圾箱中获得NULL时才会发生这种情况，但它可能会发生。 当您穿过垃圾箱时，不要检查NULL 。 垃圾箱未初始化。 只需将每个设置为NULL 。

主要问题在insert()中。 你的prev变量在while循环之前没有初始化，如果你没有进入循环，它也不会在它之后。 当prev未初始化时设置prev->next = entry会带来麻烦，并且可能会导致崩溃错误。 特别是考虑到您第一次在 bin 中插入内容时， entry将为NULL ，因此您第一次触发错误。 当你取消引用一个未初始化的指针时会发生什么是未定义的，但这很少意味着什么好。 崩溃是最好的情况。

我理解这里的逻辑。 您想沿列表移动prev以便可以在末尾插入新entry ，并且在循环通过 bin 中的条目之前没有最后一个元素。 但这并不意味着您不能拥有指向要插入新条目的位置的初始化指针。 如果使用指向指针的指针，则可以从表数组中的条目开始。 那不是list_node ，所以list_node *不会为prev做，但list_node **会工作得很好。 你可以这样做：

node_ptr new_entry(const char *key, const char *value)
{
  node_ptr entry = malloc(sizeof *entry);
  if (!entry) abort(); // Add error checking
  entry->value = malloc(strlen(value) + 1);
  entry->key = malloc(strlen(key) + 1);
  strncpy(entry->key, key, strlen(key));
  strncpy(entry->value, value, strlen(value));
  entry->next = NULL;
  return entry;
}

void
insert(const char *key, const char *value, HASHTABLE H)
{
    unsigned int slot = hash(key, H->table_size);
    node_ptr entry = H->list_ptr_arr[slot];

    // Make sure that we always have a prev, by pointing it
    // to the location where we want to insert a new entry,
    // which we want at the bin if nothing else
    node_ptr *loc = &H->list_ptr_arr[slot];

    while(entry != NULL)
    {
        if(strcmp(entry->key, key)==0)
        {
            free(entry->value);
            entry->value = malloc(strlen(value)+1);
            strncpy(entry->value,value,strlen(value));
            return;
        }

        // make loc the entry's next
        loc = &entry->next;
        // and move entry forward (we don't need prev->next now)
        entry = entry->next;
    }

    // now loc will hold the address we should put
    // the entry in
    *loc = new_entry(key, value);
}

当然，由于箱中的列表没有按任何特定顺序排序或保存（除非有您没有提到的限制），您不需要 append 新条目。 您也可以预先添加它们。 然后你不需要拖动这样的loc来进行其他线性搜索。 您可以执行以下操作：

node_ptr find_in_bin(const char *key, node_ptr bin)
{
  for (node_ptr entry = bin; entry; entry = entry->next) {
    if(strcmp(entry->key, key)==0)
      return entry;
  }
  return 0;
}

void
insert(const char *key, const char *value, HASHTABLE H)
{
    unsigned int slot = hash(key, H->table_size);
    node_ptr *bin = &H->list_ptr_arr[slot];
    node_ptr entry = find_in_bin(key, *bin);
    if (entry) {
      free(entry->value);
      entry->value = malloc(strlen(value)+1);
      strncpy(entry->value,value,strlen(value));
    } else {
      *bin = new_entry(key, value, *bin);
    }
}

如果您以这种方式修复初始化和插入，我认为代码应该可以工作。 它适用于我进行的少数测试，但我可能错过了一些东西。

不是这样的错误，但我仍然会很快发表评论。 next_prime() function 看起来像是 Eratosthenes 筛子的慢版本。 很好，它计算一个素数（除非我错过了什么），但它不是你需要的。 如果你用谷歌搜索它，你会发现前 K 个素数的表，对于相当大的 K。你可以轻松地将它们嵌入到你的代码中。 也就是说，如果您绝对希望您的表具有主要尺寸。 不过，你不需要。 拥有其他尺寸的桌子并没有错。

散列的模素数有一些好处，但 hash 表不必具有素数的大小才能工作。 如果您有一个较大的素数 P 和一个大小为 M 的 hash 表，则可以执行 ((i % P) % M) 并获得模 P 的好处以及表大小为 M 的便利。因此，如果 M 是 2 的幂，则更容易，然后最后的模运算可以是非常快速的位掩码：

#define mask_k(n,k) (n & ((1 << k) - 1))

然后后来...

   int index = mask_k(i % P, k); // where table size is 1 << k

i % P也可能不是必需的，这取决于您的 hash function 有多好。 如果您有一个 hash function 可以让您接近随机数，那么i中的位是随机的，然后k个最低有效位也是如此，而% P没有做任何改进。 但是如果你想对一个素数做模，你可以对一个大素数这样做，并将它屏蔽到一个更小的表格大小，所以你不必使用一个素数的表格大小。 而且，如果您想拥有一个素数的表大小，请使用素数表。 每次调整表格大小时都必须计算新的素数很慢。

C - 制作单独的链接 Hash 表 - 问题

问题描述

1 个解决方案

解决方案1
0 2020-11-27 03:32:28

C - 制作单独的链接 Hash 表 - 问题

问题描述

1 个解决方案

解决方案1 0 2020-11-27 03:32:28

解决方案1
0 2020-11-27 03:32:28