简体   繁体   中英

Segmentation fault when using fscanf in c

I am really trying to learn if someone wouldn't mind to educate me in the principles I may be missing out on here. I thought I had everything covered but it seems I am doing something incorrectly.

The following code gives me a segmentation fault, and I cannot figure out why? I am adding the & in front of the arguments name being passed in to fscanf .

int word_size = 0;

#define HASH_SIZE 65536

#define LENGTH = 45

node* global_hash[HASH_SIZE] = {NULL};

typedef struct node {
  char word[LENGTH + 1];
  struct node* next;
} node;

int hash_func(char* hash_val){
    int h = 0;
    for (int i = 0, j = strlen(hash_val); i < j; i++){
        h = (h << 2) ^ hash_val[i];
    }
    return h % HASH_SIZE;
}

bool load(const char *dictionary)
{
    char* string;
    FILE* dic = fopen(dictionary, "r");
    if(dic == NULL){
        fprintf(stdout, "Error: File is NULL.");
        return false;
    }
    while(fscanf(dic, "%ms", &string) != EOF){
        node* new_node = malloc(sizeof(node));
        if(new_node == NULL){
            return false;
        }
        strcpy(new_node->word, string);
        new_node->next = NULL;
        int hash_indx = hash_func(new_node->word);
        node* first = global_hash[hash_indx];
        if(first == NULL){
            global_hash[hash_indx] = new_node;
        } else {
            new_node->next = global_hash[hash_indx];
            global_hash[hash_indx] = new_node;
        }
        word_size++;
        free(new_node);
    }
    fclose(dic);
    return true;
}

dictionary.c:25:16: runtime error: left shift of 2127912344 by 2 places cannot be represented in type 'int'
dictionary.c:71:23: runtime error: index -10167 out of bounds for type 'node *[65536]'
dictionary.c:73:13: runtime error: index -10167 out of bounds for type 'node *[65536]'
dictionary.c:75:30: runtime error: index -22161 out of bounds for type 'node *[65536]'
dictionary.c:76:13: runtime error: index -22161 out of bounds for type 'node *[65536]'

Segmentation fault

Update after OP posted more code

The problem is that your hash_func works with signed integers and that it overflows. Therefore you get a negative return value (or rather undefined behavior).

That is also what these lines tell you:

dictionary.c:25:16: runtime error: left shift of 2127912344 by 2 places cannot be represented in type 'int'

Here it tells you that you have a signed integer overflow

dictionary.c:71:23: runtime error: index -10167 out of bounds for type 'node *[65536]'

Here it tells you that you use a negative index into an array (ie global_hash )

Try using unsigned integer instead

unsigned int hash_func(char* hash_val){
    unsigned int h = 0;
    for (int i = 0, j = strlen(hash_val); i < j; i++){
        h = (h << 2) ^ hash_val[i];
    }
    return h % HASH_SIZE;
}

and call it like:

unsigned int hash_indx = hash_func(new_node->word);

Original answer

I'm not sure this is the root cause of all problems but it seems you have some problems with memory allocation.

Each time you call fscanf you get new dynamic memory allocated for string du to %ms . However, you never free that memory so you have a leak.

Further, this looks like a major problem:

        global_hash[hash_indx] = new_node;  // Here you save new_node
    } else {
        new_node->next = global_hash[hash_indx];
        global_hash[hash_indx] = new_node;  // Here you save new_node
    }
    word_size++;
    free(new_node);  // But here you free the memory

So it seems your table holds pointers to memory that have been free'd already.

That is a major problem that may cause seg faults when you use the pointers.

Maybe change this

free(new_node); 

to

free(string);

In general I'll suggest that you avoid %ms and also avoid fscanf . Use char string[LENGTH + 1] and fgets instead.

There are multiple issues in the code posted. Here are the major ones:

  • you should use unsigned arithmetic for the hash code computation to ensure that the hash value is positive. The current implementation has undefined behavior as words longer than 15 letters cause an arithmetic overflow, which may produce a negative value and cause the modulo to be negative as well, indexing outside the bounds of global_hash .

  • You free the newly allocated node with free(new_node); . It has been stored into the global_hash array: later dereferencing it for another word with the same hash value will cause undefined behavior. You probably meant to free the parsed word instead with free(string); .

Here are the other issues:

  • you should check the length of the string before copying it to the node structure array with strcpy(new_node->word, string);

  • fscanf(dic, "%ms", &string) is not portable. the m modifier causes fscanf to allocate memory for the word, but it is an extension supported by the glibc that may not be available in other environments. You might want to write a simple function for better portability.

  • the main loop should test for successful conversion with while(fscanf(dic, "%ms", &string) == 1) instead of just end of file with EOF . It may not cause a problem in this specific case, but it is a common cause of undefined behavior for other conversion specifiers.

  • the definition #define HASH_SIZE 65536; has a extra ; which may cause unexpected behavior if HASH_SIZE is used in expressions.

  • the definition #define LENGTH = 45; is incorrect: the code does not compile as posted.

Here is a modified version:

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>

#define HASH_SIZE 65536
#define LENGTH 45

typedef struct node {
    char word[LENGTH + 1];
    struct node *next;
} node;

int word_size = 0;
node *global_hash[HASH_SIZE];

unsigned hash_func(const char *hash_val) {
    unsigned h = 0;
    for (size_t i = 0, j = strlen(hash_val); i < j; i++) {
        h = ((h << 2) | (h >> 30)) ^ (unsigned char)hash_val[i];
    }
    return h % HASH_SIZE;
}

/* read a word from fp, skipping initial whitespace.
   return the length of the word read or EOF at end of file
   store the word into the destination array, truncating it as needed
*/
int get_word(char *buf, size_t size, FILE *fp) {
    int c;
    size_t i;
    while (isspace(c = getc(fp)))
        continue;
    if (c == EOF)
        return EOF;
    for (i = 0;; i++) {
        if (i < size)
           buf[i] = c;
        c = getc(fp);
        if (c == EOF)
            break;
        if (isspace(c)) {
            ungetc(c, fp);
            break;
        }
    }
    if (i < size)
        buf[i] = '\0';
    else if (size > 0)
        buf[size - 1] = '\0';
    return i;
}

bool load(const char *dictionary) {
    char buf[LENGTH + 1];
    FILE *dic = fopen(dictionary, "r");
    if (dic == NULL) {
        fprintf(stderr, "Error: cannot open dictionary file %s\n", dictionary);
        return false;
    }
    while (get_word(buf, sizeof buf, dic) != EOF) {
        node *new_node = malloc(sizeof(node));
        if (new_node == NULL) {
            fprintf(stderr, "Error: out of memory\n");
            fclose(dic);
            return false;
        }
        unsigned hash_indx = hash_func(buf);
        strcpy(new_node->word, buf);
        new_node->next = global_hash[hash_indx];
        global_hash[hash_indx] = new_node;
        word_size++;
    }
    fclose(dic);
    return true;
}

the following proposed code:

  1. cleanly compiles
  2. still has a major problem with the function: hash_func()
  3. separates the definition of the struct from the typedef for that struct for clarity and flexibility.
  4. properly formats the #define statements
  5. properly handles errors from fopen() and malloc()
  6. properly limits the length of the string read from the 'dictionary' file
  7. assumes that no text from the 'dictionary' file will be greater than 45 bytes.

and now, the proposed code:

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>

//prototypes
bool load(const char *dictionary);
int hash_func(char* hash_val);


#define HASH_SIZE 65536
#define LENGTH  45


struct node
{
    char word[LENGTH + 1];
    struct node* next;
};
typedef struct node node;


node* global_hash[HASH_SIZE] = {NULL};
int word_size = 0;

int hash_func(char* hash_val)
{
    int h = 0;
    for ( size_t i = 0, j = strlen(hash_val); i < j; i++)
    {
        h = (h << 2) ^ hash_val[i];
    }
    return h % HASH_SIZE;
}


bool load(const char *dictionary)
{
    char string[ LENGTH+1 ];
    FILE* dic = fopen(dictionary, "r");
    if(dic == NULL)
    {
        perror( "fopen failed" );
        //fprintf(stdout, "Error: File is NULL.");
        return false;
    }

    while( fscanf( dic, "%45s", string) == 1 )
    {
        node* new_node = malloc(sizeof(node));
        if(new_node == NULL)
        {
            perror( "malloc failed" );
            return false;
        }

        strcpy(new_node->word, string);
        new_node->next = NULL;

        int hash_indx = hash_func(new_node->word);

        // following statement for debug:
        printf( "index returned from hash_func(): %d\n", hash_indx );

        if( !global_hash[hash_indx] )
        {
            global_hash[hash_indx] = new_node;
        }

        else
        {
            new_node->next = global_hash[hash_indx];
            global_hash[hash_indx] = new_node;
        }

        word_size++;
    }
    fclose(dic);
    return true;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM