简体   繁体   English

将逗号分隔的文本文件读入结构 C

[英]Read a Comma Delimited Text File into a Struct C

I have a comma delimited list of boats and their specs that I need to read into a struct.我有一个逗号分隔的船列表及其规格,我需要将其读入结构。 Each line contains a different boat along with their specs so I have to read the file line by line.每行都包含不同的船及其规格,因此我必须逐行阅读文件。

Sample Input File (the file I'll be using has over 20 lines):示例输入文件(我将使用的文件有 20 多行):

pontoon,Crest,Carribean RS 230 SLC,1,Suzuki,115,Blue,26,134595.00,135945.00,1,200,0,250,450,450,0
fishing,Key West,239 FS,1,Mercury,250,Orange,24,86430.00,87630.00,0,0,250,200,500,250,0
sport boat,Tahoe,T16,1,Yamaha,300,Yellow,22,26895.00,27745.00,0,250,0,0,350,250,0

I have a linked list watercraft_t:我有一个链表 watercraft_t:

typedef struct watercraft {
    char type[15];     // e.g. pontoon, sport boat, sailboat, fishing, 
                       //      canoe, kayak, jetski, etc.
    char make[20];
    char model[30];
    int propulsion;    // 0 = none; 1 = outBoard; 2 = inBoard; 
    char engine[15];   // Suzuki, Yamaha, etc.
    int hp;             // horse power  
    char color[25];
    int length;        // feet
    double base_price;
    double total_price;
    accessories_t extras;
    struct watercraft *next;
} watercraft_t;

My main function opens the file and stores it in a pointer:我的主要 function 打开文件并将其存储在指针中:

FILE * fp = fopen(argv[1], "r"); // Opens file got from command line arg

This file is then passed to a function that should parse exactly 1 line and then return that node to be placed inside of a linked list.然后将该文件传递给 function ,它应该准确解析 1 行,然后返回该节点以放置在链表中。

 // Create watercrafts from the info in file
watercraft_t *new_waterCraft( FILE *inFile )
{
    watercraft_t *newNode;

    newNode = (watercraft_t*)malloc(sizeof(watercraft_t));

    fscanf(inFile, "%s %s %s %d %s %d %s %d %lf %lf", newNode->type, newNode->make, newNode->model, &(newNode->propulsion), newNode->engine, &(newNode->hp), newNode->color, &(newNode->length), &(newNode->base_price), &(newNode->total_price));

    return newNode;
}

When calling a function to print just the type of each boat this is the result:当调用 function 来打印每艘船的类型时,结果如下:

1. pontoon,Crest,CRS
2. SLC,1,Suzuki,11fishing,Key
3. SLC,1,Suzuki,11fishing,Key
4. SLC,1,Suzuki,11fishing,Key
5. SLC,1,Suzuki,11fishing,Key
6. SLC,1,Suzuki,11fishing,Key
7. SLC,1,Suzuki,11fishing,Key
8. SLC,1,Suzuki,11fishing,Key
9. SLC,1,Suzuki,11fishing,Key
10. SLC,1,Suzuki,11fishing,Key
11. SLC,1,Suzuki,11fishing,Key
12. SLC,1,Suzuki,11fishing,Key
13. SLC,1,Suzuki,11fishing,Key
14. SLC,1,Suzuki,11fishing,Key
15. SLC,1,Suzuki,11fishing,Key
16. SLC,1,Suzuki,11fishing,Key
17. SLC,1,Suzuki,11fishing,Key

I've narrowed the issue down to how I'm reading the values from the file with fscanf.我已将问题缩小到如何使用 fscanf 从文件中读取值。

The first thing I attempted was to use %*c in between all of my placeholders but after running that, my output looked exactly the same.我尝试的第一件事是在所有占位符之间使用 %*c ,但在运行之后,我的 output 看起来完全一样。 The next thing I realized is that I won't be able to use fscanf because the text file will have whitespace that needs to be read.接下来我意识到我将无法使用 fscanf 因为文本文件将包含需要读取的空白。

My next thought was to use fgets but I don't think I will be able to use that either because I'm not sure how many characters will have to be read each time.我的下一个想法是使用 fgets,但我认为我也不能使用它,因为我不确定每次必须读取多少个字符。 I just need it to stop reading at the end of the line while separating the values by the comma.我只需要它在行尾停止阅读,同时用逗号分隔值。

I've been searching for answers for a few hours now but nothing has seemed to work so far.我一直在寻找答案几个小时,但到目前为止似乎没有任何效果。

When you use %s the text will be parsed until a space or a newline character is found, for instance, for the first line of your file, fscanf will store "pontoon,Crest,Carribean" in make , the parsing stops when a space is found.当您使用%s时,将解析文本直到找到空格或换行符,例如,对于文件的第一行, fscanf将在make中存储"pontoon,Crest,Carribean" ,当有空格时解析停止被发现。

The fscanf specifier must mach the line in the file, including the commas, so you would need something like this: fscanf说明符必须匹配文件中的行,包括逗号,所以你需要这样的东西:

" %14[^,], %19[^,], %29[^,], %d , %14[^,], %d , %24[^,], %d , %lf , %lf /*...*/"

(note the space at the beginning of the format specifier, this avoids parsing leftover blanks from previous reads) (注意格式说明符开头的空格,这样可以避免解析先前读取的剩余空白)

The format specifier [^,] makes fscanf read until a comma is found or the limit size is reached, it will also parse spaces as opposed to %s , moreover, using %14[^,] avoids potential undefined behavior via buffer overflow because it limits the read to 14 characters plus the null terminator matching the size of the buffer which is 15 .格式说明符[^,]使fscanf读取直到找到逗号或达到限制大小,它还将解析空格而不是%s ,此外,使用%14[^,]避免了潜在的未定义行为通过缓冲区溢出,因为它将读取限制为14字符加上与缓冲区大小匹配的 null 终止符,即15

Using fgets to parse the line seems like a good idea, you can then use sscanf to convert the values, it works similarly to fscanf .使用fgets解析行似乎是一个好主意,然后您可以使用sscanf转换值,它的工作方式类似于fscanf

I would advise you to verify the return of *scanf to make sure that the correct number of fields was read.我建议您验证*scanf的返回,以确保读取了正确数量的字段。

My next thought was to use fgets but I don't think I will be able to use that either because I'm not sure how many characters will have to be read each time.我的下一个想法是使用 fgets,但我认为我也不能使用它,因为我不确定每次必须读取多少个字符。 I just need it to stop reading at the end of the line while separating the values by the comma.我只需要它在行尾停止阅读,同时用逗号分隔值。

Not a bad approach.不错的方法。 This can be done this way:这可以通过以下方式完成:

int main(void) {
    FILE *fp = fopen("in.txt", "r");

    while(1) {
        int length = 0;
        int ch;
        long offset= ftell(fp);

        // Calculate length of next line
        while((ch = fgetc(fp)) != '\n' && ch != EOF)
            length++;
        
        // Go back to beginning of line
        fseek(fp, offset, SEEK_SET);

        // If EOF, it's the last line, and if the length is zero, we're done
        if(ch == EOF && length == 0)
            break;
    
        // Allocate space for line plus BOTH \n AND \0
        char *buffer = malloc(length+2);

        // Read the line
        fgets(buffer, length+2, fp);

        // Do something with the buffer, for instance with sscanf

        // Cleanup
        free(buffer);
        if(ch == EOF) break;
    }
}

I have omitted all error checking to keep the code short.我省略了所有错误检查以保持代码简短。

I wouldn't recommend the approach with fscanf .我不推荐使用fscanf的方法。 Error-prone and not flexible.容易出错且不灵活。

Here an example to parse a csv file:下面是一个解析 csv 文件的示例:

int main(int argc, char *argv[])
{
    #define LINE_MAX 1024
    char line_buf[LINE_MAX];
    char *line = line_buf;

    char *delim;
    int sep = ',';

    #define FIELD_MAX 128
    char field[FIELD_MAX];

    FILE *fp = fopen(argv[1], "r");

    //foreach line
    while ((line = fgets(line_buf, LINE_MAX, fp))) {

        //iterate over the line
        for (char *line_end = line + strlen(line) - 1; line < line_end;) {

            //search for a separator
            delim = strchr(line, sep);

            //strchr returns NULL if no separator was found
            if (delim == NULL) {
                //we set delim to the end of line,
                //because we want to process the remaining chars (field)
                delim = line_end;
            }

            //the first character of field is at 'line'
            //the last character of field is at delim - 1
            //delim points to the separator

            //e.g.
            size_t len = delim - line;
            memcpy(field, line, len);
            field[len] = '\0';
            printf("%s ", field);
            //end of example

            //set the position to the next character after the separator
            line = delim + 1;

        }

        printf("\n");

    }

    return EXIT_SUCCESS;
}

Note : add an empty line to the end of the csv file, otherwise the last character of the last line will not be considered (reason: line_end = line + strlen(line) - 1 ).注意:在 csv 文件的末尾添加一个空行,否则将不考虑最后一行的最后一个字符(原因: line_end = line + strlen(line) - 1 )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM