简体   繁体   English

使用 strsep 的分段错误

[英]Segmentation fault using strsep

I'm trying to use strsep to remove extra characters in an CSV file.我正在尝试使用strsep删除 CSV 文件中的额外字符。 The problem is that when I run it, it gives me Segmentation Fault and I can't figure out why.问题是当我运行它时,它给了我分段错误,我不知道为什么。 Here's the code:这是代码:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
#include <ctype.h>

typedef struct {

    int id, followers, following, public_gists; 
    int public_repos;
    char *login;
    char *type;
    char *created_at;
    int *follower_list;
    int *following_list;
        
} *User;

void checkUsersFile();
FILE *createCSV();
void corrigirFicheiro();
User criarUser();

int count = 0;

void checkUsersFile() {
    //Ficheiro "users.csv"
    FILE *file = fopen("ficheirosG1/users-set2.csv", "r");
    
    //Verifica se o ficheiro "users.csv" existe
    if(!file) {
        printf("Ficheiro não encontrado");
        return;
    }

    //Cria ficheiro "users-ok.csv"
    FILE *newFile = createCSV("users-ok.csv");

    corrigirFicheiro(file, newFile);

    printf("%d\n", count);
}

//Cria e retorna ficheiro "users-ok.csv"
FILE *createCSV(char *nome) {
    FILE *file = fopen(nome, "w");
    return file;
}

//Função responsável por intrepretar o ficheiro "users.csv" e colocar os dados corretos no ficheiro "users-ok.csv"
void corrigirFicheiro(FILE *file, FILE *newFile) {

    //imprimirPrimeiraLinha(file, newFile);

    char string[200000];
    //Uma linha do ficheiro com, no máximo, 200.000 caracteres

    while ((fgets(string, 200000, file))) {
        if (string[0] != '\0') {
            //1. Criar user
            //2. Print user

            User user = criarUser(&string);
            if (user != NULL) {
                printf("ok\n");
            }
            free(user);
            
        }
    }

    //free(string);

}

//Cria um User a partir de uma linha do ficheiro
User criarUser(char *str) {
    
    User novoUser;

    novoUser = (User) malloc(sizeof(User));
    
    for(int i = 0; i<10; i++) {

        //char *a = strdup(strsep(&str, ";"));
        //char *b = strdup(strsep(&a, "\n"));
        char *p = strsep(&str, ";\n\r");

        if (strlen(p) == 0) {
            count++;
            free(novoUser);
            return NULL;
        }
            
    }

    return novoUser;
}


int main(){
    checkUsersFile();

    return 0;
}

Using gdb to debug the code, it says that it occurs in the line if(strlen(p) == 0 { So it doesn't even enter the switch case. I don't know why this is happening.使用gdb调试代码,它说它出现在if(strlen(p) == 0 {所以它甚至没有进入switch case。我不知道为什么会发生这种情况。

Thank you谢谢

I see no reason to think that the strsep() call is responsible for the error you encounter.我认为没有理由认为strsep()调用对您遇到的错误负责。

This is wrong, however:然而这是错误的:

 User novoUser = (User) malloc(sizeof(User));

and it very likely is responsible for your error.它很可能为您的错误负责。

User is a pointer type, so sizeof(User) is the size of a pointer, which is not large enough for a structure of the kind that a User points to. User是指针类型,所以sizeof(User)是指针的大小,对于User指向的那种结构来说不够大。 When you later try to assign to the members of the structure to which it points (omitted) or to access them in printUser() (also omitted), you will overrun the bounds of the allocated object.当您稍后尝试分配给它指向的结构成员(省略)或在printUser()访问它们(也省略)时,您将超出分配对象的边界。 That's exactly the kind of thing that might cause a segfault.这正是可能导致段错误的那种事情。

An excellent idiom for expressing an allocation such as that uses the receiving variable to establish the amount of space to allocate:表达分配的一个很好的习惯用法,例如使用接收变量来确定要分配的空间量:

    User novoUser = malloc(sizeof(*novoUser));

Note that I have also removed the unneeded cast.请注意,我还删除了不需要的演员表。


As I expressed in comments, however, it is poor style to hide pointer nature behind a typedef , as your User does, and personally, I don't much care even for most typedef s that avoid that pitfall.然而,正如我在评论中所表达的那样,将指针性质隐藏在typedef后面是一种糟糕的风格,就像您的User那样,而且就我个人而言,即使对于大多数避免该陷阱的typedef s,我也不太在意。

Here is how you could do it with a better typedef:以下是使用更好的 typedef 的方法:

typedef struct {
    int id, followers, following, public_gists; 
    int public_repos;
    char *login;
    char *type;
    char *created_at;
    int *follower_list;
    int *following_list;
} User;  // not a pointer

// ...

User *criarUser(char *str) {
    // ...
    User *novoUser = malloc(sizeof(*novoUser));  // Note: no change on the right-hand side
    // ...
    return novoUser;
}

But this is how I would do it, without a typedef:但这就是我要做的,没有 typedef:

struct user {
    int id, followers, following, public_gists; 
    int public_repos;
    char *login;
    char *type;
    char *created_at;
    int *follower_list;
    int *following_list;
};

// ...

struct user *criarUser(char *str) {
    // ...
    struct user *novoUser = malloc(sizeof(*novoUser));  // still no change on the right-hand side
    // ...
    return novoUser;
}

In answer to you question I wrote a short example that illustrates what is needed.为了回答你的问题,我写了一个简短的例子来说明需要什么。 In essence you need to allocate storage for string and pass the address of string to criarUser() .从本质上讲,你需要为字符串分配存储和地址传递stringcriarUser() You cannot use an array because the type passed to criarUser() would be pointer-to-array not pointer-to-pointer .您不能使用数组,因为传递给criarUser()的类型是指向数组的指针而不是指向指针的指针 (note: you can use the array so long as it is allowed to decay to a pointer so taking the address does not result in pointer-to-array -- example at end below) (注意:您可以使用数组,只要它允许衰减为指针,因此取地址不会导致指向数组的指针 - 下面的示例)

An (close) example of corrigirFicheiro() with needed changes to use with allocated storage is: corrigirFicheiro()的(关闭)示例需要更改以用于分配的存储:

void corrigirFicheiro(FILE *file, FILE *newFile)
{

    char *string = malloc (200000);           /* allocate for string */
    /* valdiate allocation here */
    char *original_ptr = string;              /* save original pointer */

    while ((fgets(string, 200000, file))) {
        if (string[0] != '\0') {
            User *user = criarUser(&string);  /* pass address of string */
            if (user != NULL) {
                printUser(user, newFile);
            }
            free(user);
        }
    }
    free (original_ptr);                      /* free original pointer */
}

An abbreviated struct was used in the following example, but the principal is the same.以下示例中使用了缩写结构,但主体相同。 For sample input, you can simply pipe a couple of lines from printf and read on stdin and write to stdout .对于示例输入,您可以简单地从printf管道几行并在stdin读取并写入stdout I used:我用了:

$ printf "one;two;3\nfour;five;6\n" | ./bin/strsep_example

A short MCVE (where I have taken liberty to remove the typedef'ed pointer) would be一个简短的 MCVE(我已经冒昧删除了 typedef'ed 指针)将是

#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXC      128
#define DELIM     ";\r\n"
#define NFIELDS   3

typedef struct {
  char login[MAXC];
  char type[MAXC];
  int public_repos;
} User;

void printUser (User *user, FILE *newFile)
{
  fprintf (newFile, "\nlogin : %s\ntype  : %s\nrepo  : %d\n",
            user->login, user->type, user->public_repos);
}

User *criarUser(char **str)
{
    User *novoUser = malloc(sizeof *novoUser);
    /* validate allocation here */
    
    for(int i = 0; i<NFIELDS; i++) {

        char *p = strsep(str, DELIM);
        switch (i) {
          case 0: strcpy (novoUser->login, p);
            break;
          case 1: strcpy (novoUser->type, p);
            break;
          case 2: novoUser->public_repos = atoi(p);
            break;
        }
        if (strlen(p) == 0) {
            free(novoUser);
            return NULL;
        }
            
    }

    return novoUser;
}

void corrigirFicheiro(FILE *file, FILE *newFile)
{

    char *string = malloc (200000);           /* allocate for string */
    /* valdiate allocation here */
    char *original_ptr = string;              /* save original pointer */

    while ((fgets(string, 200000, file))) {
        if (string[0] != '\0') {
            User *user = criarUser(&string);  /* pass address of string */
            if (user != NULL) {
                printUser(user, newFile);
            }
            free(user);
        }
    }
    free (original_ptr);                      /* free original pointer */
}

int main (int argc, char **argv) {
  
  /* use filename provided as 1st argument (stdin by default) */
  FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

  if (!fp) {  /* validate file open for reading */
    perror ("file open failed");
    return 1;
  }
  
  corrigirFicheiro(fp, stdout);
  
  if (fp != stdin)   /* close file if not stdin */
    fclose (fp);
}

Example Use/Output示例使用/输出

$ printf "one;two;3\nfour;five;6\n" | ./bin/strsep_example

login : one
type  : two
repo  : 3

login : four
type  : five
repo  : 6

Memory Use/Error Check内存使用/错误检查

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.在你写的,可动态分配内存的任何代码,您有任何关于分配的内存任何块2个职责:(1)始终保持一个指针的起始地址的存储器中,以便块,(2),当它是没有它可以被释放不再需要。

It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.您必须使用内存错误检查程序来确保您不会尝试访问内存或写入超出/超出分配块的范围,尝试读取或基于未初始化值的条件跳转,最后确认你释放了你分配的所有内存。

For Linux valgrind is the normal choice.对于 Linux valgrind是正常的选择。 There are similar memory checkers for every platform.每个平台都有类似的内存检查器。 They are all simple to use, just run your program through it.它们都易于使用,只需通过它运行您的程序即可。

$ printf "one;two;3\nfour;five;6\n" | valgrind ./bin/strsep_example
==6411== Memcheck, a memory error detector
==6411== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==6411== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==6411== Command: ./bin/strsep_example
==6411==

login : one
type  : two
repo  : 3

login : four
type  : five
repo  : 6
==6411==
==6411== HEAP SUMMARY:
==6411==     in use at exit: 0 bytes in 0 blocks
==6411==   total heap usage: 5 allocs, 5 frees, 205,640 bytes allocated
==6411==
==6411== All heap blocks were freed -- no leaks are possible
==6411==
==6411== For counts of detected and suppressed errors, rerun with: -v
==6411== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Always confirm that you have freed all memory you have allocated and that there are no memory errors.始终确认您已释放所有分配的内存并且没有内存错误。

Look things over and let me know if you have questions.仔细检查一下,如果您有任何问题,请告诉我。

Using An Array使用数组

When you pass the array as a parameter it decays to a pointer by virtue of array/pointer conversion.当您将数组作为参数传递时,它会通过数组/指针转换衰减为指针。 If that is done and the "array" type is no longer associated with the pointer you can then take the address and use automatic storage with strsep() as pointed out by @thebusybee.如果这样做并且“数组”类型不再与指针相关联,那么您可以获取地址并使用strsep()指出的带有strsep()自动存储。

The changes to the program above to do so would be:对上述程序的更改将是:

User *criarUser(char *str)
{
    User *novoUser = malloc(sizeof *novoUser);
    /* validate allocation here */
    
    for(int i = 0; i<NFIELDS; i++) {

        char *p = strsep(&str, DELIM);
        switch (i) {
          case 0: strcpy (novoUser->login, p);
            break;
          case 1: strcpy (novoUser->type, p);
            break;
          case 2: novoUser->public_repos = atoi(p);
            break;
        }
        if (strlen(p) == 0) {
            free(novoUser);
            return NULL;
        }
            
    }

    return novoUser;
}

void corrigirFicheiro(FILE *file, FILE *newFile)
{

    char string[200000];

    while ((fgets(string, 200000, file))) {
        if (string[0] != '\0') {
            User *user = criarUser(string);  /* pass string, as pointer */
            if (user != NULL) {
                printUser(user, newFile);
            }
            free(user);
        }
    }
}

But note, without the benefit of array/pointer conversion when passing string as a parameter, you must use allocated storage.但请注意,在将string作为参数传递时,如果没有数组/指针转换的好处,则必须使用已分配的存储空间。 Up to you, but know the caveat.由你决定,但要知道警告。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM