简体   繁体   English

将.csv文件解析为C中的二维数组

[英]Parsing .csv file into 2D array in C

I have a .csv file that reads like: 我有一个.csv文件,其内容如下:

SKU,Plant,Qty
40000,ca56,1245
40000,ca81,12553.3
40000,ca82,125.3
45000,ca62,0
45000,ca71,3
45000,ca78,54.9

Note: This is my example but in reality this has about 500,000 rows and 3 columns. 注意:这是我的例子,但实际上这有大约500,000行和3列。

I am trying to convert these entries into a 2D array so that I can then manipulate the data. 我试图将这些条目转换为2D数组,以便我可以操作数据。 You'll notice that in my example I just set a small 10x10 matrix A to try and get this example to work before moving on to the real thing. 您会注意到,在我的示例中,我只是设置一个小的10x10矩阵A来尝试让这个示例工作,然后再转向真实的东西。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

const char *getfield(char *line, int num);

int main() {
    FILE *stream = fopen("input/input.csv", "r");
    char line[1000000];
    int A[10][10];
    int i, j = 0;

    //Zero matrix
    for (i = 0; i < 10; i++) {
        for (j = 0; j < 10; j++) {
            A[i][j] = 0;
        }
    }

    for (i = 0; fgets(line, 1000000, stream); i++) {
        while (j < 10) {
            char *tmp = strdup(line);
            A[i][j] = getfield(tmp, j);
            free(tmp);
            j++;
        }
    }
    //print matrix
    for (i = 0; i < 10; i++) {
        for (j = 0; j < 10; j++) {
            printf("%s\t", A[i][j]);
        }
        printf("\n");
    }
}

const char *getfield(char *line, int num) {
    const char *tok;
    for (tok = strtok(line, ",");
         tok && *tok;
         tok = strtok(NULL, ",\n"))
    {
        if (!--num)
            return tok;
    }
    return 0;
}

It prints only "null" errors, and it is my belief that I am making a mistake related to pointers on this line: A[i][j] = getfield(tmp, j) . 它只打印“null”错误,我相信我犯了与这一行上的指针有关的错误: A[i][j] = getfield(tmp, j) I'm just not really sure how to fix that. 我只是不确定如何解决这个问题。

This is work that is based almost entirely on this question: Read .CSV file in C . 这是几乎完全基于这个问题的工作: 在C中读取.CSV文件 Any help in adapting this would be very much appreciated as it's been a couple years since I last touched C or external files. 任何帮助适应这一点将非常感激,因为我上次触摸C或外部文件已经过了几年。

It looks like commenters have already helped you find a few errors in your code. 看起来评论者已经帮助您在代码中找到一些错误。 However, the problems are pretty entrenched. 然而,问题是根深蒂固的。 One of the biggest issues is that you're using strings. 最大的问题之一是你正在使用字符串。 Strings are, of course, char arrays; 字符串当然是char数组; that means that there's already a dimension in use. 这意味着已经有一个维度正在使用中。

It would probably be better to just use a struct like this: 最好只使用这样的结构:

struct csvTable
{
    char sku[10];
    char plant[10];
    char qty[10];
};

That will also allow you to set your columns to the right data types (it looks like SKU could be an int, but I don't know the context). 这也将允许您将列设置为正确的数据类型(看起来SKU可能是一个int,但我不知道上下文)。

Here's an example of that implementation. 这是该实现的一个示例。 I apologize for the mess, it's adapted on the fly from something I was already working on. 我为这个烂摊子道歉,它是从我已经在做的事情中动态调整的。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// Based on your estimate
// You could make this adaptive or dynamic

#define rowNum 500000

struct csvTable
{
    char sku[10];
    char plant[10];
    char qty[10];
};

// Declare table
struct csvTable table[rowNum];

int main()
{
    // Load file
    FILE* fp = fopen("demo.csv", "r");

    if (fp == NULL)
    {
        printf("Couldn't open file\n");
        return 0;
    }

    for (int counter = 0; counter < rowNum; counter++)
    {
        char entry[100];
        fgets(entry, 100, fp);

        char *sku = strtok(entry, ",");
        char *plant = strtok(NULL, ",");
        char *qty = strtok(NULL, ",");

        if (sku != NULL && plant != NULL && qty != NULL)
        {
            strcpy(table[counter].sku, sku);
            strcpy(table[counter].plant, plant);
            strcpy(table[counter].qty, qty);
        }
        else
        {
            strcpy(table[counter].sku, "\0");
            strcpy(table[counter].plant, "\0");
            strcpy(table[counter].qty, "\0");
        }
    }

    // Prove that the process worked
    for (int printCounter = 0; printCounter < rowNum; printCounter++)
    {
        printf("Row %d: column 1 = %s, column 2 = %s, column 3 = %s\n", 
            printCounter + 1, table[printCounter].sku, 
            table[printCounter].plant, table[printCounter].qty);
    }

    // Wait for keypress to exit
    getchar();

}

There are multiple problems in your code: 您的代码中存在多个问题:

  • In the second loop, you do not stop reading the file after 10 lines, so you would try and store elements beyond the end of the A array. 在第二个循环中,您不会在10行之后停止读取文件,因此您将尝试将元素存储在A数组的末尾之外。
  • You do not reset j to 0 at the start of the while (j < 10) loop. while (j < 10)循环开始时,不要将j重置为0 j happens to have the value 10 at the end of the initialization loop, so you effectively do not store anything into the matrix. j恰好在初始化循环结束时具有值10 ,因此您实际上不会将任何内容存储到矩阵中。
  • The matrix A should be a 2D array of char * , not int , or potentially an array of structures. 矩阵A应该是char *的2D数组,而不是int ,或者可能是结构数组。

Here is a simpler version with an allocated array of structures: 这是一个更简单的版本,带有分配的结构数组:

#include <stdio.h>
#include <stdlib.h>

typedef struct item_t {
    char SKU[20];
    char Plant[20];
    char Qty[20];
};

int main(void) {
    FILE *stream = fopen("input/input.csv", "r");
    char line[200];
    int size = 0, len = 0, i, c;
    item_t *A = NULL;

    if (stream) {
        while (fgets(line, sizeof(line), stream)) {
            if (len == size) {
                size = size ? size * 2 : 1000;
                A = realloc(A, sizeof(*A) * size);
                if (A == NULL) {
                    fprintf(stderr, "out of memory for %d items\n", size);
                    return 1;
                }
            }
            if (sscanf(line, "%19[^,\n],%19[^,\n],%19[^,\n]%c",
                       A[len].SKU, A[len].Plant, A[len].Qty, &c) != 4
            ||  c != '\n') {
                fprintf(stderr, "invalid format: %s\n, line);
            } else {
                len++;
            }
        }
        fclose(stream);

        //print matrix
        for (i = 0; i < len; i++) {
            printf("%s,%s,%s\n", A[i].SKU, A[i].Plant, A[i].Qty);
        }
        free(A);
    }
    return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM