简体   繁体   English

在C中使用fscanf时出现分段错误

[英]Segmentation fault when using fscanf in c

I am really trying to learn if someone wouldn't mind to educate me in the principles I may be missing out on here. 我真的在尝试学习是否有人不介意按照我可能在这里错过的原则来教育我。 I thought I had everything covered but it seems I am doing something incorrectly. 我以为我已经覆盖了所有内容,但似乎我做错了什么。

The following code gives me a segmentation fault, and I cannot figure out why? 以下代码给我一个分段错误,我不知道为什么吗? I am adding the & in front of the arguments name being passed in to fscanf . 我在传递给fscanf的参数名称前添加&

int word_size = 0;

#define HASH_SIZE 65536

#define LENGTH = 45

node* global_hash[HASH_SIZE] = {NULL};

typedef struct node {
  char word[LENGTH + 1];
  struct node* next;
} node;

int hash_func(char* hash_val){
    int h = 0;
    for (int i = 0, j = strlen(hash_val); i < j; i++){
        h = (h << 2) ^ hash_val[i];
    }
    return h % HASH_SIZE;
}

bool load(const char *dictionary)
{
    char* string;
    FILE* dic = fopen(dictionary, "r");
    if(dic == NULL){
        fprintf(stdout, "Error: File is NULL.");
        return false;
    }
    while(fscanf(dic, "%ms", &string) != EOF){
        node* new_node = malloc(sizeof(node));
        if(new_node == NULL){
            return false;
        }
        strcpy(new_node->word, string);
        new_node->next = NULL;
        int hash_indx = hash_func(new_node->word);
        node* first = global_hash[hash_indx];
        if(first == NULL){
            global_hash[hash_indx] = new_node;
        } else {
            new_node->next = global_hash[hash_indx];
            global_hash[hash_indx] = new_node;
        }
        word_size++;
        free(new_node);
    }
    fclose(dic);
    return true;
}

dictionary.c:25:16: runtime error: left shift of 2127912344 by 2 places cannot be represented in type 'int'
dictionary.c:71:23: runtime error: index -10167 out of bounds for type 'node *[65536]'
dictionary.c:73:13: runtime error: index -10167 out of bounds for type 'node *[65536]'
dictionary.c:75:30: runtime error: index -22161 out of bounds for type 'node *[65536]'
dictionary.c:76:13: runtime error: index -22161 out of bounds for type 'node *[65536]'

Segmentation fault

Update after OP posted more code OP发布更多代码后进行更新

The problem is that your hash_func works with signed integers and that it overflows. 问题是您的hash_func使用带符号整数,并且溢出。 Therefore you get a negative return value (or rather undefined behavior). 因此,您将获得负的返回值(或者更确切地说是未定义的行为)。

That is also what these lines tell you: 这些行也告诉您:

dictionary.c:25:16: runtime error: left shift of 2127912344 by 2 places cannot be represented in type 'int' dictionary.c:25:16:运行时错误:2127912344左移2位不能用'int'类型表示

Here it tells you that you have a signed integer overflow 在这里,它告诉您有符号整数溢出

dictionary.c:71:23: runtime error: index -10167 out of bounds for type 'node *[65536]' dictionary.c:71:23:运行时错误:索引-10167超出类型“节点* [65536]”的范围

Here it tells you that you use a negative index into an array (ie global_hash ) 在这里,它告诉您对数组使用负索引(即global_hash

Try using unsigned integer instead 尝试改用无符号整数

unsigned int hash_func(char* hash_val){
    unsigned int h = 0;
    for (int i = 0, j = strlen(hash_val); i < j; i++){
        h = (h << 2) ^ hash_val[i];
    }
    return h % HASH_SIZE;
}

and call it like: 并这样称呼:

unsigned int hash_indx = hash_func(new_node->word);

Original answer 原始答案

I'm not sure this is the root cause of all problems but it seems you have some problems with memory allocation. 我不确定这是否是所有问题的根本原因,但似乎您在内存分配方面存在一些问题。

Each time you call fscanf you get new dynamic memory allocated for string du to %ms . 每次调用fscanf ,都会为%ms du的string du分配新的动态内存。 However, you never free that memory so you have a leak. 但是,您永远不会free该内存,因此会发生泄漏。

Further, this looks like a major problem: 此外,这似乎是一个主要问题:

        global_hash[hash_indx] = new_node;  // Here you save new_node
    } else {
        new_node->next = global_hash[hash_indx];
        global_hash[hash_indx] = new_node;  // Here you save new_node
    }
    word_size++;
    free(new_node);  // But here you free the memory

So it seems your table holds pointers to memory that have been free'd already. 因此,您的表似乎拥有指向已经释放的内存的指针。

That is a major problem that may cause seg faults when you use the pointers. 这是一个主要问题,在使用指针时可能会导致段错误。

Maybe change this 也许改变这个

free(new_node); 

to

free(string);

In general I'll suggest that you avoid %ms and also avoid fscanf . 通常,我建议您避免%ms并且也避免fscanf Use char string[LENGTH + 1] and fgets instead. 使用char string[LENGTH + 1]fgets代替。

There are multiple issues in the code posted. 发布的代码中存在多个问题。 Here are the major ones: 这是主要的:

  • you should use unsigned arithmetic for the hash code computation to ensure that the hash value is positive. 您应该对散列码计算使用无符号算法,以确保散列值为正。 The current implementation has undefined behavior as words longer than 15 letters cause an arithmetic overflow, which may produce a negative value and cause the modulo to be negative as well, indexing outside the bounds of global_hash . 当前的实现具有不确定的行为,因为长度超过15个字母的单词会导致算术溢出,这可能会产生负值,并使模数也为负,从而索引global_hash的边界global_hash

  • You free the newly allocated node with free(new_node); 您可以使用free(new_node);释放新分配的节点free(new_node); . It has been stored into the global_hash array: later dereferencing it for another word with the same hash value will cause undefined behavior. 它已存储在global_hash数组中:稍后将其引用为具有相同哈希值的另一个单词将导致未定义的行为。 You probably meant to free the parsed word instead with free(string); 您可能打算用free(string);释放解析的单词free(string); .

Here are the other issues: 以下是其他问题:

  • you should check the length of the string before copying it to the node structure array with strcpy(new_node->word, string); 您应该先检查string的长度,然后再使用strcpy(new_node->word, string);将其复制到节点结构数组中strcpy(new_node->word, string);

  • fscanf(dic, "%ms", &string) is not portable. fscanf(dic, "%ms", &string)是不可移植的。 the m modifier causes fscanf to allocate memory for the word, but it is an extension supported by the glibc that may not be available in other environments. m修饰符使fscanf为单词分配内存,但这是glibc支持的扩展,在其他环境中可能不可用。 You might want to write a simple function for better portability. 您可能需要编写一个简单的函数以提高可移植性。

  • the main loop should test for successful conversion with while(fscanf(dic, "%ms", &string) == 1) instead of just end of file with EOF . 主循环应使用while(fscanf(dic, "%ms", &string) == 1)而不是仅使用EOF结束文件来测试转换是否成功。 It may not cause a problem in this specific case, but it is a common cause of undefined behavior for other conversion specifiers. 在这种特定情况下,它可能不会引起问题,但是对于其他转换说明符来说,这是导致行为未定义的常见原因。

  • the definition #define HASH_SIZE 65536; 定义#define HASH_SIZE 65536; has a extra ; 有一个额外的; which may cause unexpected behavior if HASH_SIZE is used in expressions. 如果在表达式中使用HASH_SIZE则可能导致意外的行为。

  • the definition #define LENGTH = 45; 定义#define LENGTH = 45; is incorrect: the code does not compile as posted. 是不正确的:该代码不按发布方式编译。

Here is a modified version: 这是修改后的版本:

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>

#define HASH_SIZE 65536
#define LENGTH 45

typedef struct node {
    char word[LENGTH + 1];
    struct node *next;
} node;

int word_size = 0;
node *global_hash[HASH_SIZE];

unsigned hash_func(const char *hash_val) {
    unsigned h = 0;
    for (size_t i = 0, j = strlen(hash_val); i < j; i++) {
        h = ((h << 2) | (h >> 30)) ^ (unsigned char)hash_val[i];
    }
    return h % HASH_SIZE;
}

/* read a word from fp, skipping initial whitespace.
   return the length of the word read or EOF at end of file
   store the word into the destination array, truncating it as needed
*/
int get_word(char *buf, size_t size, FILE *fp) {
    int c;
    size_t i;
    while (isspace(c = getc(fp)))
        continue;
    if (c == EOF)
        return EOF;
    for (i = 0;; i++) {
        if (i < size)
           buf[i] = c;
        c = getc(fp);
        if (c == EOF)
            break;
        if (isspace(c)) {
            ungetc(c, fp);
            break;
        }
    }
    if (i < size)
        buf[i] = '\0';
    else if (size > 0)
        buf[size - 1] = '\0';
    return i;
}

bool load(const char *dictionary) {
    char buf[LENGTH + 1];
    FILE *dic = fopen(dictionary, "r");
    if (dic == NULL) {
        fprintf(stderr, "Error: cannot open dictionary file %s\n", dictionary);
        return false;
    }
    while (get_word(buf, sizeof buf, dic) != EOF) {
        node *new_node = malloc(sizeof(node));
        if (new_node == NULL) {
            fprintf(stderr, "Error: out of memory\n");
            fclose(dic);
            return false;
        }
        unsigned hash_indx = hash_func(buf);
        strcpy(new_node->word, buf);
        new_node->next = global_hash[hash_indx];
        global_hash[hash_indx] = new_node;
        word_size++;
    }
    fclose(dic);
    return true;
}

the following proposed code: 以下建议的代码:

  1. cleanly compiles 干净地编译
  2. still has a major problem with the function: hash_func() 函数仍然存在主要问题: hash_func()
  3. separates the definition of the struct from the typedef for that struct for clarity and flexibility. 为了清晰和灵活,将结构的定义与该结构的typedef分开。
  4. properly formats the #define statements 正确格式化#define语句
  5. properly handles errors from fopen() and malloc() 正确处理来自fopen()malloc()
  6. properly limits the length of the string read from the 'dictionary' file 适当限制从“字典”文件中读取的字符串的长度
  7. assumes that no text from the 'dictionary' file will be greater than 45 bytes. 假设“字典”文件中的任何文本都不会大于45个字节。

and now, the proposed code: 现在,建议的代码为:

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>

//prototypes
bool load(const char *dictionary);
int hash_func(char* hash_val);


#define HASH_SIZE 65536
#define LENGTH  45


struct node
{
    char word[LENGTH + 1];
    struct node* next;
};
typedef struct node node;


node* global_hash[HASH_SIZE] = {NULL};
int word_size = 0;

int hash_func(char* hash_val)
{
    int h = 0;
    for ( size_t i = 0, j = strlen(hash_val); i < j; i++)
    {
        h = (h << 2) ^ hash_val[i];
    }
    return h % HASH_SIZE;
}


bool load(const char *dictionary)
{
    char string[ LENGTH+1 ];
    FILE* dic = fopen(dictionary, "r");
    if(dic == NULL)
    {
        perror( "fopen failed" );
        //fprintf(stdout, "Error: File is NULL.");
        return false;
    }

    while( fscanf( dic, "%45s", string) == 1 )
    {
        node* new_node = malloc(sizeof(node));
        if(new_node == NULL)
        {
            perror( "malloc failed" );
            return false;
        }

        strcpy(new_node->word, string);
        new_node->next = NULL;

        int hash_indx = hash_func(new_node->word);

        // following statement for debug:
        printf( "index returned from hash_func(): %d\n", hash_indx );

        if( !global_hash[hash_indx] )
        {
            global_hash[hash_indx] = new_node;
        }

        else
        {
            new_node->next = global_hash[hash_indx];
            global_hash[hash_indx] = new_node;
        }

        word_size++;
    }
    fclose(dic);
    return true;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM