简体   繁体   中英

Storing words into a hashtable

I have a file that contains English words in a txt file, each word in a new line. I'm a beginner in C. I'm using a load and unload functions to store all the words into a hashtable (separate chaining) and unload them from memory, but has ran into some problems.

The functions (the code in main.c is correct):

load:

#include <stdbool.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
#include <stdio.h>

#include "dictionary.h"

#define SIZE 26

typedef  struct node
{
    char word[LENGTH+1];
    struct node *next;
}node;

unsigned int hash_num = 0;
node *hashtable[SIZE];  //26 letters in alphabet

node *head = NULL;

// hashfunction
unsigned int hash(char const *key)  
{
    unsigned int hash= tolower(key[0]) - 'a';
    return hash % SIZE;
}

/**
 * Loads dictionary into memory.  Returns true if successful else false.
 */
bool load(const char* dictionary)
{

    unsigned int hash_index=0;

    FILE *fp = fopen(dictionary, "r");
    if(fp == NULL)
    {
        fprintf(stderr, "Couldn't open %s",dictionary);
        return false;
    }

    //dictionary 


    while(!feof(fp))
    {
        node *temp = malloc(sizeof(node));
        if (temp == NULL)
        {
            unload();
            fclose(fp);
            return false;
        }


        if(fscanf(fp , "%s", temp->word) == 1)   //storing word of dictionary in my new_node -> word
        {
            hash_index = hash(temp->word); 
            head= hashtable[hash_index];    //head always points to first element of linked list (containting word of dictionary)


            temp->next = head;  
            head = temp;        


            hash_num++;
        }
        else    //if fscanf couldn't store the word (return 0)
        {
            free(temp);    //free last temp
            break;
        }
    }

    fclose(fp);
    return true;

}

unload:

bool unload(void)
{

    for(int i=0; i<SIZE; i++)
    {
        if(hashtable[i] != NULL)      //if hashtable isn't NULL (has nodes)
        {
            node *cursor = hashtable[i];        //cursor points at head of individual linked list
            while(cursor != NULL)       //free them
            {
                node *temp = cursor;
                cursor = cursor->next;
                free(temp);
            }
        }
    }

    return true;
}

Can anyone tell me if the logic is correct? Whenever I run valgrind it tells me that all my nodes were allocated but just 3 free'd.

total heap usage: 143,094 allocs, 3 frees, 8,014,288 bytes allocated
LEAK SUMMARY:
==15903==    definitely lost: 8,013,040 bytes in 143,090 blocks
==15903==    indirectly lost: 0 bytes in 0 blocks
==15903==      possibly lost: 0 bytes in 0 blocks

When checking the provided source code (missing "dictionary.h"), the main problem is locating in the load() function.

Problem 1 (Main) - the hashtable[] is never updated when adding a new word/node (after computing hash_index = hash(temp->word); ).

To store the updated linked-list (managed as reversed), it is necessary to update the hashtable[hash_index] with the new node pointer (the allocated temp node).

temp->next = head;
head = temp;

hashtable[hash_index] = head; // update the hashtable[] pointer

hash_num++;

Alternate solution without global variable head .

temp->next = hashtable[hash_index]; //head always points to first element...
hashtable[hash_index] = temp; // update the hashtable[] pointer

hash_num++;

Instead of

temp->next = head;
head = temp;

hash_num++;

Problem 2 (Small) - the hashtable[SIZE] is never initialized.

In the unload() function, in the for-loop, the if-condition if(hashtable[i] != NULL) assumes that each item of the array is initialized to NULL.

Add at the beginning the load() function or before calling it, a for-loop to initialize each pointer.

for(int i=0; i<SIZE; i++)
{
    hashtable[i] = NULL;
}

Problem 3 (Potential Bug Source) - as suggest by reviewer, the use of head , declared as a global variable node *head = NULL; could be a potential source of bug.

In the load() function, the variable head is used as a temporary storage but could store value during software run. If a read operation is performed without a well-known write operation before, the result could be an unexpected error even if the compilation doesn't detect error or warning.

The best way is to reduce the number of global variable as much as possible.

Enhancement (Reverse the linked-list) - because the managed linked-list is adding new items in the front, here is a solution to add new items in the end.

node *first = hashtable[hash_index];
if (first == NULL) {
    hashtable[hash_index] = temp;
}
else {
    temp->next = NULL; // ending the list
    while (first->next!=NULL) {
        first = first->next;  // loop until last node
    }
    first->next = temp; // linking to the last node
}

hash_num++;

Instead of

head= hashtable[hash_index];    //head always points to first element ...

temp->next = head;  
head = temp;        

hash_num++;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM