简体   繁体   English

在 C 中读取 csv 并返回二维数组的函数

[英]Function to read csv and return a 2d array in C

I just started with C and I've been trying to figure out this all day and it's driving me crazy.我刚开始使用 C,我一整天都在试图弄清楚这一点,这让我发疯了。 I'm trying to create a function to read a CSV file like this one:我正在尝试创建一个函数来读取这样的 CSV 文件:

10190935A;Sonia;Arroyo;Quintana;M;70
99830067Q;Josefa;Cuenca;Orta;M;42
28122337F;Nuria;Garriga;Dura;M;43
03265079E;Manuel;Orts;Robles;H;45

And create a 2D array and return it to use it later in other functions.并创建一个二维数组并返回它以便稍后在其他函数中使用它。 This is the function:这是函数:

void cargarPacientes ()
{

    FILE *file = fopen (".\pacientes.csv", "r");
    char buffer[256 * 500];
    char *arrayOfLines[500];
    char *line = buffer;
    size_t buf_siz = sizeof (buffer);
    int i = 0, n;

    while (fgets (line, buf_siz, file)) {
        char *p = strchr (line, '\n');
        if (p) {
            *p = '\0';
        } else {
            p = strchr (line, '\0');
        }
        arrayOfLines[i++] = line;
        buf_siz -= p - line + 1;
        if (p + 1 == buffer + sizeof (buffer)) {
            break;
        }
        line = p + 1;
    }
    fclose (file);
    n = i;
    int y = 0;
    char *pacientes[20][6];
    for (i = 0; i < n; ++i) {
        char *token;
        char *paciente[6];
        int x = 0;
        token = strtok (arrayOfLines[i], ";");
        while (token != NULL) {
            paciente[x] = token;
            pacientes[y][x] = token;
            token = strtok (NULL, ";");
            x++;
        }
        y++;
    }
    // return pacientes;
}

I also tried using structures, but I really don't know how do they work.我也尝试过使用结构,但我真的不知道它们是如何工作的。 This is the structure:这是结构:

struct Paciente {
    char dni[9];
    char nombre[20];
    char primerApellido[20];
    char segundoApellido[20];
    char sexo[1];
    int edad;
};

There's anyway to return the array from that function or there's any other way to do the same in an easier way?无论如何可以从该函数返回数组,或者还有其他更简单的方法来做同样的事情吗? I've also tried this , but I'm having problems, can't even compile.我也试过这个,但我遇到了问题,甚至无法编译。

    void cargarPacientes(size_t N, size_t M, char *pacientes[N][M]
    void main(){
        char *pacientes[20][6];
        cargarPacientes(20, 6, pacientes);
    }

These are the compiler errors (sorry they are in spanish):这些是编译器错误(抱歉它们是西班牙语):

C:\Users\Nozomu\CLionProjects\mayo\main.c(26): error C2466: no se puede asignar una matriz de tama¤o constante 0
C:\Users\Nozomu\CLionProjects\mayo\main.c(26): error C2087: 'pacientes': falta el sub¡ndice
C:\Users\Nozomu\CLionProjects\mayo\main.c(88): warning C4048: sub¡ndices de matriz distintos: 'char *(*)[1]' y 'char *[20][6]'

If I understand that you want to read your file and separate each line into a struct Paciente , then the easiest way to do so is to simply allocate a block of memory containing some anticipated number of struct Paciente , fill each with the data read from your file keeping track of the number of struct filled.如果我知道你想读取你的文件并将每一行分成一个struct Paciente ,那么最简单的方法是简单地分配一个包含一些预期数量的struct Paciente的内存块,用从你的读取的数据填充每个文件跟踪填充的结构数。 When the number of struct filled equals the number you have allocated, you simply realloc to increase the number of struct available and keep going...当结构填充数等于你已经分配的号码,您只需realloc增加可用结构的数量和继续下去......

This is made easier by the fact that your struct Paciente contains members that are fully defined and don't need any further allocation individually.由于您的struct Paciente包含完全定义的成员并且不需要单独进行任何进一步分配,这使得这变得更容易。

The basic approach is straight-forward.基本方法是直截了当的。 You will allocate a block of memory in cargarPaciente() to hold each struct read from the file.您将在cargarPaciente()分配一块内存来保存从文件中读取的每个结构体。 You will take a pointer as a parameter and you update the value at that memory location with the number of struct you have filled.您将使用一个指针作为参数,并使用您填充的结构数更新该内存位置处的值。 You return a pointer to your allocated block of memory containing your struct elements making them available back in the caller and you have the number of struct filled available through the pointer you passed as a parameter.您返回一个指向您分配的内存块的指针,其中包含您的结构元素,使它们在调用者中可用,并且您可以通过作为参数传递的指针填充可用的结构数量。

You also generally want to pass an open FILE* pointer as a parameter to your function for reading data from.您通常还希望将一个打开的FILE*指针作为参数传递给您的函数以从中读取数据。 (If you can't successfully open the file back in the caller, then there is no reason to make the function call to fill your struct in the first place). (如果您无法在调用者中成功打开文件,则没有理由首先调用函数来填充您的结构)。 Changing your function call slightly to accommodate the open FILE* pointer and the pointer to the number of struct filled, you could do:稍微更改函数调用以适应打开的FILE*指针和指向已填充结构数的指针,您可以执行以下操作:

struct Paciente *cargarPacientes (FILE *fp, size_t *n)

(or after creating a typedef to your struct for convenience [see below], you could do) (或者在为您的结构创建typedef之后为方便起见 [见下文],您可以这样做)

Paciente *cargarPacientes (FILE *fp, size_t *n)

Looking at the setup to read your file, in main() you would want to declare a pointer to struct, a variable to hold the count of the number of struct read, and a FILE* pointer to your file stream, eg查看读取文件的设置,在main()您需要声明一个指向 struct 的指针,一个保存结构读取数的变量,以及一个指向文件流的FILE*指针,例如

int main (int argc, char **argv) {

    Paciente *pt = NULL;
    size_t n = 0;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
    ...
    pt = cargarPacientes (fp, &n);  /* call function assign allocated return */

Other than the validations on the fopen and on the return of cargarPacientes() , that is all you need in main() .除了fopen上的验证和cargarPacientes()的返回cargarPacientes() ,这就是main()所需的全部内容。

The work will be done in your cargarPacientes() function.这项工作将在您的cargarPacientes()函数中完成。 To begin, simply declare a buffer large enough to hold each line, your variable to track the number of struct allocated , and then a pointer to the block of memory holding your collection of struct Paciente .首先,只需声明一个足够大的缓冲区来容纳每一行,你的变量来跟踪结构分配的数量,然后是一个指向保存struct Paciente集合的内存块的指针。 ( MAXC is defined as a constant of 1024 and NP defined as 2 to allocate storage for 2 struct Paciente initially) MAXC定义为常数1024NP定义为2最初为 2 struct Paciente分配存储空间)

Paciente *cargarPacientes (FILE *fp, size_t *n)
{
    char buf[MAXC];     /* buffer to read each line */
    size_t npt = NP;    /* no. Paciente struct allocated */
    Paciente *pt = malloc (npt * sizeof *pt);   /* allocate initial NP struct */

As with any allocation, before you make use of the block you have allocated, always validate the allocation succeeded, eg与任何分配一样,在使用已分配的块之前,始终验证分配是否成功,例如

    if (!pt) {          /* validate every allocation */
        perror ("malloc-pt");
        return NULL;
    }

( note: on error, your function returns NULL instead of the address of an allocated block of memory to indicate failure) 注意:出错时,您的函数返回NULL而不是已分配内存块的地址以指示失败)

Now simply read each line and parse the semi-colon separated values into a temporary struct.现在只需读取每一行并将分号分隔的值解析为一个临时结构。 This allows you to validate you were able to parse the values into the individual members of the struct before assigning the struct to one of the allocated struct in the block of memory you allocated eg这允许您在将结构分配给您分配的内存块中已分配的结构之一之前验证您是否能够将值解析为结构的各个成员,例如

    while (fgets (buf, MAXC, fp)) {     /* read each line into buf */
        Paciente tmp = { .dni = "" };   /* temp struct to hold values */
        /* parse line into separate member values using sscanf */
        if (sscanf (buf, FMT, tmp.dni, tmp.nombre, tmp.primerApellido,
                    tmp.segundoApellido, &tmp.sexo, &tmp.edad) == 6) {

note: FMT is define above as a string literal and also note the size of ndi has increased from 9-char to 10-char so it can be treated as a string value and sexo has been declared as a single char instead of an array of char [1] , eg注意: FMT在上面定义为字符串文字,还请注意ndi的大小已从 9-char 增加到 10-char,因此可以将其视为字符串值,并且将sexo声明为单个char而不是数组字符 [1] ,例如

#define FMT "%9[^;];%19[^;];%19[^;];%19[^;];%c;%d"

If you successfully parse the line of data into your temporary struct, you next check if the number of struct you have filled equals the number allocated, and if so, realloc the amount of memory available.如果你成功地解析数据线到你的临时结构,你下一次检查,如果你填写结构的数量等于数量分配的,如果是的话, realloc可用内存量。 (you can add as little as 1 additional struct [inefficient] or you can scale the amount of memory allocated by some factor - here we just double the number of struct allocated beginning from 2 ) (您可以添加少至 1 个额外的 struct [低效] 或者您可以按某个因素扩展分配的内存量 - 在这里,我们只是将分配的 struct 数量从2开始增加一倍)

            /* check if used == allocated to check if realloc needed */
            if (*n == npt) {
                /* always realloc using temporary pointer */
                void *ptmp = realloc (pt, 2 * npt * sizeof *pt);
                if (!ptmp) {    /* validate every realloc */
                    perror ("realloc-pt");
                    break;
                }
                pt = ptmp;      /* assign newly sized block to pt */
                npt *= 2;       /* update no. of struct allocated */
            }

( note: you must realloc using a temporary pointer because if realloc fails it returns NULL which if you assign to your original pointer creates a memory leak due to the loss of the address of the original block of memory that can now no longer be freed) 注意:必须realloc使用临时指针,因为如果realloc失败则返回NULL其中如果分配到原来的指针创建一个内存泄漏由于存储原始块,可现在已不再被释放的地址的损失)

All that remains is assigning your temporary struct to the allocated block of memory and updating the number filled, eg剩下的就是将临时结构分配给分配的内存块并更新填充的数字,例如

            pt[(*n)++] = tmp;   /* assign struct to next struct */
        }
    }

That's it, return the pointer to your allocated block and you are done:就是这样,返回指向您分配的块的指针,您就完成了:

    return pt;  /* return pointer to allocated block of mem containing pt */
}

To avoid sprinkling Magic-Numbers throughout your code and to avoid Hardcoding Filenames , a set of constants are defined for 2, 10, 20, 1024 using a global enum .为了避免在整个代码中散布魔术数字并避免硬编码文件名,使用全局enum2, 10, 20, 1024定义了一组常量。 You could accomplish the same thing using individual #define statements for each, the global enum is just convenient for defining multiple integer constants in a single line.您可以对每个语句使用单独的#define语句来完成相同的事情,全局enum只是方便在一行中定义多个整数常量。

enum { NP = 2, DNI = 10, NAME = 20, MAXC = 1024 };

#define FMT "%9[^;];%19[^;];%19[^;];%19[^;];%c;%d"

Now you no longer have individual numbers in your struct definition and changing the constant and FMT string is all that is required if you need to change the size of any of the members of your struct (you cannot use constants or variables in the sscanf format string, so individual numbers are always required there.现在,您的结构定义中不再有单独的数字,如果您需要更改结构的任何成员的大小,则只需更改常量和FMT字符串即可(您不能在sscanf格式字符串中使用常量或变量) ,所以那里总是需要个人号码。

typedef struct Paciente {
    char dni[DNI];
    char nombre[NAME];
    char primerApellido[NAME];
    char segundoApellido[NAME];
    char sexo;
    int edad;
} Paciente;

To avoid hardcoding the filename, we take the filename to read from as the first argument to your program (or read from stdin if no argument is provided).为了避免对文件名进行硬编码,我们将要读取的文件名作为程序的第一个参数(如果没有提供参数,则从stdin读取)。 This avoids having to recompile your program every time the name of your input file changes.这避免了每次输入文件的名称更改时都必须重新编译程序。

Putting it altogether you could do:总而言之,你可以这样做:

#include <stdio.h>
#include <stdlib.h>

enum { NP = 2, DNI = 10, NAME = 20, MAXC = 1024 };

#define FMT "%9[^;];%19[^;];%19[^;];%19[^;];%c;%d"

typedef struct Paciente {
    char dni[DNI];
    char nombre[NAME];
    char primerApellido[NAME];
    char segundoApellido[NAME];
    char sexo;
    int edad;
} Paciente;

Paciente *cargarPacientes (FILE *fp, size_t *n)
{
    char buf[MAXC];     /* buffer to read each line */
    size_t npt = NP;    /* no. Paciente struct allocated */
    Paciente *pt = malloc (npt * sizeof *pt);   /* allocate initial NP struct */

    if (!pt) {          /* validate every allocation */
        perror ("malloc-pt");
        return NULL;
    }

    while (fgets (buf, MAXC, fp)) {     /* read each line into buf */
        Paciente tmp = { .dni = "" };   /* temp struct to hold values */
        /* parse line into separate member values using sscanf */
        if (sscanf (buf, FMT, tmp.dni, tmp.nombre, tmp.primerApellido,
                    tmp.segundoApellido, &tmp.sexo, &tmp.edad) == 6) {
            /* check if used == allocated to check if realloc needed */
            if (*n == npt) {
                /* always realloc using temporary pointer */
                void *ptmp = realloc (pt, 2 * npt * sizeof *pt);
                if (!ptmp) {    /* validate every realloc */
                    perror ("realloc-pt");
                    break;
                }
                pt = ptmp;      /* assign newly sized block to pt */
                npt *= 2;       /* update no. of struct allocated */
            }
            pt[(*n)++] = tmp;   /* assign struct to next struct */
        }
    }

    return pt;  /* return pointer to allocated block of mem containing pt */
}

int main (int argc, char **argv) {

    Paciente *pt = NULL;
    size_t n = 0;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    pt = cargarPacientes (fp, &n);  /* call function assign allocated return */
    if (!pt) {  /* validate the return was no NULL */
        fputs ("cargarPacientes-empty\n", stderr);
        return 1;
    }

    if (fp != stdin)   /* close file if not stdin */
        fclose (fp);

    for (size_t i = 0; i < n; i++) {    /* output all struct saved in pt */
        printf ("%-9s %-10s %-10s %-10s  %c  %d\n", pt[i].dni, pt[i].nombre,
                pt[i].primerApellido, pt[i].segundoApellido, pt[i].sexo,
                pt[i].edad);
    }

    free (pt);    /* don't forget to free the memory you have allocated */
}

Example Use/Output示例使用/输出

With your sample data in the file dat/patiente.csv , the program produces the following output:使用文件dat/patiente.csv中的示例数据,程序会生成以下输出:

$ ./bin/readpatiente dat/patiente.csv
10190935A Sonia      Arroyo     Quintana    M  70
99830067Q Josefa     Cuenca     Orta        M  42
28122337F Nuria      Garriga    Dura        M  43
03265079E Manuel     Orts       Robles      H  45

Memory Use/Error Check内存使用/错误检查

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.在你写的,可动态分配内存的任何代码,您有任何关于分配的内存任何块2个职责:(1)始终保持一个指针的起始地址的存储器中,以便块,(2),当它是没有它可以被释放不再需要。

It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.您必须使用内存错误检查程序来确保您不会尝试访问内存或写入超出/超出分配块的范围,尝试读取或基于未初始化值的条件跳转,最后确认你释放了你分配的所有内存。

For Linux valgrind is the normal choice.对于 Linux valgrind是正常的选择。 There are similar memory checkers for every platform.每个平台都有类似的内存检查器。 They are all simple to use, just run your program through it.它们都易于使用,只需通过它运行您的程序即可。

$ valgrind ./bin/readpatiente dat/patiente.csv
==1099== Memcheck, a memory error detector
==1099== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1099== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==1099== Command: ./bin/readpatiente dat/patiente.csv
==1099==
10190935A Sonia      Arroyo     Quintana    M  70
99830067Q Josefa     Cuenca     Orta        M  42
28122337F Nuria      Garriga    Dura        M  43
03265079E Manuel     Orts       Robles      H  45
==1099==
==1099== HEAP SUMMARY:
==1099==     in use at exit: 0 bytes in 0 blocks
==1099==   total heap usage: 5 allocs, 5 frees, 6,128 bytes allocated
==1099==
==1099== All heap blocks were freed -- no leaks are possible
==1099==
==1099== For counts of detected and suppressed errors, rerun with: -v
==1099== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Always confirm that you have freed all memory you have allocated and that there are no memory errors.始终确认您已释放所有分配的内存并且没有内存错误。

This is much simpler than trying to hardcode fixed 2D arrays in attempt to handle parsing the values from the file.这比尝试硬编码固定 2D 数组以尝试处理解析文件中的值要简单得多。 Look things over and let me know if you have further questions.仔细检查一下,如果您还有其他问题,请告诉我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM