简体   繁体   English

CSV文件高级解析

[英]CSV file advanced parsing

I have a problem with parsing a .csv file. 我在解析.csv文件时遇到问题。 I have a struct world defined like this: 我有一个这样定义的struct world

typedef struct world
{
    char worldName[30];
    int worldId;
    char *message;
    char **constellationArray;
    struct world *next;
} tWorld;

And I have a .csv file designed like this (so the 'c' is for 'semi-Colon'): 我有一个.csv文件,其设计如下(因此,“ c”是“分号”的意思):

worldId;worldName;message;constellationArray
1;K'tau;Planeta pod ochranou Freyra;Aquarius;Crater;Orion;Sagittarius;Cetus;Gemini;Earth
2;Martin's homeworld;Znicena;Aries;Sagittarius;Monoceros;Serpens;Caput;Scutum;Hydra;Earth
3;...

The task seems simple: write a method loadWorlds(char *file) . 任务似乎很简单:编写一个方法loadWorlds(char *file) Load the file and parse it. 加载文件并解析它。 The number of constellations is not guaranteed. 不能保证星座数量。 Each new line signals a new world and I have to create a linked list of these worlds. 每条新线表示一个新世界,我必须创建这些世界的链接列表。 I have a rough idea of doing this, but I can't make it work. 我对此有一个大概的想法,但我无法使其正常工作。 I have a method called tWorld *createWorld() which is implemented as such: 我有一个称为tWorld *createWorld() ,它的实现方式如下:

tWorld *createWorld() {
    tWorld *world;
    world = (*tWorld)malloc((sizeof(tWorld)));
    return world;
}

I have to use this method inside my loadWorlds(char *file). 我必须在我的loadWorlds(char * file)中使用此方法。 Plus I have to serialize them into the linked list with this: 另外,我必须使用以下命令将它们序列化为链接列表:

if (*lastWorld == NULL){
    *lastWorld = nextWorld;
}else{
    (*actualWorld)->next = nextWorld;
}
*actualWorld = nextWorld;

But I don't know when to use it. 但是我不知道什么时候使用它。 This is my rough sketch of loadWorlds(char *file) : 这是我的loadWorlds(char *file)草图:

void loadWorlds(char *file)
{
    FILE *f;
    char text[30];
    char letter;
    tWorld *lastWorld = NULL, *actualWorld = NULL, *world;

    //f = fopen(file, "r");

    if(!(f = fopen(file, "r")))
    {
        printf("File does not exist! \n");
        while(!kbhit());
    }
    else
    {
        while(!(feof(f)) && (letter = fgetc(f))!= '\n')
        {

            if((znak = fgetc(f)) != ';')
            {

            }

        }
    }
}

I would be grateful for any ideas to make this work. 我将不胜感激任何想法使这项工作。

The question "How do I parse this file?... (Plus I have to serialize them into the linked list)" is a non-trivial undertaking when considered in total. 从总体上考虑, “如何解析此文件?...(此外,我必须将它们序列化到链接列表中)”这个问题是不平凡的。 Your "How do I parse this file?" 您的“我如何解析此文件?” is a question in its own right. 它本身就是一个问题。 The second part, regarding the linked list, is a whole separate issue that is not at all explained sufficiently, though it appears you are referring to a singularly-linked-list. 关于链表的第二部分是一个完全独立的问题,尽管您似乎是指单链表,但根本没有充分解释。 There are as many different ways to approach this as there are labels of wine. 酒的标签有很多不同的处理方法。 I'll attempt to provide an example of one approah to help you along. 我将尝试提供一种方法来帮助您。

In the example below, rather than creating a single static character array worldName within a tWorld struct where all other strings are dynamically allocated, I've changed worldName to a character pointer as well. 在下面的例子,而不是创建一个单一的静态字符数组worldName一个内tWorld其中所有其他字符串是动态分配的结构,我已经改变worldName到一个character pointer ,以及。 If you must use a static array of chars , that can be changed easily, but as long as you are allocating the remainder of the strings, it makes sense to allocate for worldName as well. 如果必须使用static array of charsstatic array of chars ,可以很容易地对其进行更改,但是只要分配剩余的字符串,也可以为worldName分配worldName

As to the parsing part of the question, you can use any number of library functions identified in the comments, or you can simply use a couple of pointers and step through each line parsing each string as required. 至于问题的parsing部分,您可以使用注释中标识的任意数量的库函数,也可以简单地使用几个pointers并根据需要逐步完成每一行来解析每个字符串。 Either approach is fine. 两种方法都可以。 The only benefit to using simple pointers, (aside from the learning aspect), is avoidance of repetative function calls which in some cases can be a bit more efficient. 使用简单指针的唯一好处(除了学习方面)是避免重复的函数调用,在某些情况下,重复调用可能会更有效率。 One note when parsing data from a line that has been dynamically allocated is to make sure you preserve the starting address for the buffer to insure the allocated memory can be properly tracked and freed. 从动态分配的行中解析数据时,有一个注意事项确保保留缓冲区的起始地址,以确保可以正确跟踪和释放分配的内存。 Some of the library functions clobber the original buffer (ie strtok , etc.) which can cause interesting errors if you pass the buffer itself without, in some way, preserving the original start address. 一些库函数会破坏原始缓冲区(例如strtok等),如果您通过缓冲区本身而不以某种方式保留原始起始地址,则可能导致有趣的错误。

The function read_list_csv below parses each line read from the csv file (actually semi-colon separated values) into each of the members of the tWorld struct using a pair of character pointers to parse the input line. 下面的函数read_list_csv使用一对字符指针来解析从csv文件读取的每一行(实际上是semi-colon separated值)到tWorld结构的每个成员中,以解析输入行。 read_list_csv then calls ins_node_end to insert each of filled & allocated tWorld nodes into a singularly-linked circular linked-list . 然后read_list_csv调用ins_node_end将每个已填充和分配的ins_node_end tWorld nodes插入到singularly-linked circular linked-list The parsing is commented to help explain the logic, but in summary it simply sets a starting pointer p to the beginning, then using an ending pointer ep checks each character in the line until a semi-colon ; 对解析进行了注释,以帮助解释逻辑,但总而言之,它只是将起始指针p设置为开头,然后使用结束指针ep检查行中的每个字符,直到分号; is found, temporarily sets the ; 找到后,暂时设置; to \\0 (nul) and reads the string pointed to by p . \\0 (nul)并读取p指向的字符串。 The temporary \\n is replaced with the original ; 临时\\n被替换为原来的; and the process repeats beginning with the following character, until the line has been completely parsed. 然后从下一个字符开始重复该过程,直到该行被完全解析为止。

The linked-list part of your question is somewhat more involved. 您的问题linked-list部分涉及更多。 It is complicated by many linked-list examples being only partially explained and usually equivalently correct. 许多linked-list examples仅得到部分解释并且通常等效,这使情况变得复杂。 Further, a linked-list is of little use unless you can add to it, read from it, remove from it, and get rid of it without leaking memory like a sieve. 此外,除非您可以添加linked-list ,从linked-list读取,从linked-list删除,并且摆脱linked-list而不会像筛子那样泄漏内存,否则linked-list几乎没有用。 When you look at examples, note there are two primary forms linked-lists take. 当您查看示例时,请注意链接列表有两种主要形式。 Either HEAD/TAIL lists or circular lists. HEAD/TAIL列表或circular列表。 Both can be either singularly or doubly linked. 两者都可以singularly链接或doubly链接。 HEAD/TAIL lists generally use separate pointers for the list start or HEAD and the list end or TAIL node (generally set to NULL ). HEAD/TAIL列表通常将单独的指针用于列表开头或HEAD与列表结尾或TAIL节点(通常设置为NULL )。 circular lists simply have the end node next pointer point back to the beginning of the list. circular列表只是使结束节点的next指针指向列表的开头。 Both have their uses. 两者都有其用途。 The primary benefit to the circular list is that you can traverse the list from any node to any other node, regardless where you start in the list. circular列表的主要好处是,无论您从列表的何处开始,都可以将列表任何节点遍历任何其他节点。 (since there is no end-node , you can iterate through all nodes starting from any node). (由于没有end-node ,因此您可以从任何节点开始遍历所有节点)。

The example below is a singularly linked circular list . 下面的示例是一个singularly linked circular list It provides functions for creating nodes, inserting them into the list, counting the nodes, printing the entire list, removing nodes from the list, and deleting the list. 它提供以下功能:创建节点,将其插入列表,对节点计数,打印整个列表,从列表中删除节点以及删除列表。 Importantly, it frees all memory allocated to the list. 重要的是,它释放了分配给列表的所有内存。

Go through both the parsing part of the example and the linked-list part of the example and let me know if you have questions. 通过示例的parsing部分和示例的linked-list部分,让我知道您是否有疑问。 While the list implementation should be fairly solid, there may be some undiscovered issues. 虽然列表实现应该相当可靠,但是可能存在一些未发现的问题。 The datafile used for testing as well as the sample output is shown following the code. 代码后显示了用于测试的数据文件以及示例输出。 The code expects the datafile as the first argument and an optional (zero based) node to delete as a second argument (default: node 2): 代码期望数据文件作为第一个参数,而可选的(从零开始)节点将作为第二个参数删除(默认值:节点2):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXL 256
// #define wName 30

typedef struct world
{
    // char worldName[wName];
    char *worldName;
    int worldId;
    char *message;
    char **constellationArray;
    struct world *next;
} tWorld;

/* allocate & populate node */
tWorld *create_node (int wid, char *wnm, char *msg, char **ca);

/* insert node into list */
tWorld *ins_node_end (tWorld **list, int wid, char *wnm, char *msg, char **ca);

/* read data from file fname and add to list */
tWorld *read_list_csv (tWorld **list, char *fname);

/* return the number of nodes in list */
size_t getszlist (tWorld *list);

/* print all nodes in list */
void print_list (tWorld *list);

/* free memory allocated to tWorld list node */
void free_node (tWorld *node);

/* (zero-based) delete of nth node */
void delete_node (tWorld **list, int nth);

/* delete tWorld list & free allocated memory */
void delete_list (tWorld *list);

int main (int argc, char **argv)
{
    if (argc < 2) {
        fprintf (stderr, "error: insufficient input. Usage: %s <filename> [del_row]\n", argv[0]);
        return 1;
    }

    char *fname = argv[1];
    tWorld *myworld = NULL;             /* create pointer to struct world   */

    read_list_csv (&myworld, fname);    /* read fname and fill linked list  */

    printf ("\n Read '%zd' records from file: %s\n\n", getszlist (myworld), fname);

    print_list (myworld);               /* simple routine to print list     */

    int nth = (argc > 2) ? atoi (argv[2]) : 2;
    printf ("\n Deleting node: %d\n\n", nth);
    delete_node (&myworld, nth);        /* delete a node from the list      */

    print_list (myworld);               /* simple routine to print list     */

    delete_list (myworld);              /* free memory allocated to list    */

    return 0;
}

/* allocate & populate node */
tWorld *create_node (int wid, char *wnm, char *msg, char **ca) 
{
    tWorld *node = NULL;

    node = malloc (sizeof *node);
    if (!node) return NULL;

    node-> worldId = wid;
    node-> worldName = wnm;
    node-> message = msg;
    node-> constellationArray = ca;

    return node;
}

/* insert node into list */
tWorld *ins_node_end (tWorld **list, int wid, char *wnm, char *msg, char **ca) 
{
    tWorld *node = NULL;
    if (!(node = create_node (wid, wnm, msg, ca))) return NULL;


    if (!*list) {    /* if empty, create first node */
        node-> next = node;
        *list = node;
    } else {         /* insert as new end node */
        if (*list == (*list)-> next) { /* second node, no need to iterate */
            (*list)-> next = node; 
        }
        else                           /* iterate to end node & insert    */
        {
            tWorld *iter = *list;      /* second copy to iterate list     */
            for (; iter->next != *list; iter = iter->next) ;
            iter-> next = node;        /* insert node at end of list      */
        }
        node-> next = *list;           /* set next pointer to list start  */
    }

    return *list;   /* provides return as confirmation  */
}

/* read list from file fname and add to list */
tWorld *read_list_csv (tWorld **list, char *fname)
{
    FILE *fp = fopen (fname, "r");
    if (!fp) {
        fprintf (stderr, "%s() error: file open failed for '%s'\n", __func__, fname);
        return NULL;
    }

    /* allocate and initialize all variables */
    char *line = calloc (MAXL, sizeof *line);
    char *p = NULL;
    char *ep = NULL;
    char *wnm = NULL;
    int wid = 0;
    int lcnt = 0;
    char *msg = NULL; 
    char **ca = NULL;
    size_t idx = 0;

    while (fgets (line, MAXL, fp))      /* for each line in file    */
    {
        if (lcnt++ == 0) continue;      /* skip header row          */

        p = line;
        idx = 0;
        ep = p;
        size_t len = strlen (line);     /* get line length          */
        if (line[len-1] == '\n')        /* strip newline from end   */
            line[--len] = 0;

        while (*ep != ';') ep++;        /* parse worldId            */
        *ep = 0;
        wid = atoi (p);
        *ep++ = ';';
        p = ep;

        while (*ep != ';') ep++;        /* parse worldName          */
        *ep = 0;
        wnm = strdup (p);
        *ep++ = ';';
        p = ep;

        while (*ep != ';') ep++;        /* parse message            */
        *ep = 0;
        msg = strdup (p);
        *ep++ = ';';
        p = ep;

        ca = calloc (MAXL, sizeof *ca); /* allocate constellationArray */
        if (!ca) {
            fprintf (stderr, "%s() error allocation failed for 'ca'.\n", __func__);
            return NULL;
        }
        while (*ep)                     /* parse ca array elements  */
        {
            if (*ep == ';')
            {
                *ep = 0;
                ca[idx++] = strdup (p);
                *ep = ';';
                p = ep + 1;
                /* if (idx == MAXL) reallocate ca */
            } 
            ep++;
        }
        if (*p) ca[idx++] = strdup (p); /* add last element in line */

        ins_node_end (list, wid, wnm, msg, ca); /* add to list      */
    }

    /* close file & free line */
    if (fp) fclose (fp);
    if (line) free (line);

    return *list;
}

/* return the number of nodes in list */
size_t getszlist (tWorld *list) {

    const tWorld *iter = list;  /* pointer to iterate list  */
    register int cnt = 0;

    if (iter ==  NULL) {
        fprintf (stdout,"%s(), The list is empty\n",__func__);
        return 0;
    }

    for (; iter; iter = (iter->next != list ? iter->next : NULL)) {
        cnt++;
    }
    return cnt;
}

/* print all nodes in list */
void print_list (tWorld *list) {

    const tWorld *iter = list;  /* pointer to iterate list  */
    register int idx = 0;
    char *stub = " ";

    if (iter ==  NULL) {
        fprintf (stdout,"%s(), The list is empty\n",__func__);
        return;
    }

    for (; iter; iter = (iter->next != list ? iter->next : NULL)) {
        printf (" %2d  %-20s  %-20s\n", 
                iter-> worldId, iter-> worldName, iter-> message);
        idx = 0;
        while ((iter-> constellationArray)[idx])
            printf ("%38s %s\n", stub, (iter-> constellationArray)[idx++]);
    }
}

/* free memory allocated to tWorld list node */
void free_node (tWorld *node)
{
    if (!node) return;

    register int i = 0;

    if (node-> worldName) free (node-> worldName);
    if (node-> message) free (node-> message);
    while (node-> constellationArray[i])
        free (node-> constellationArray[i++]);
    if (node-> constellationArray)
        free (node-> constellationArray);

    free (node);
}

/* (zero-based) delete of nth node */
void delete_node (tWorld **list, int nth)
{
    /* test that list exists */
    if (!*list) {
        fprintf (stdout,"%s(), The list is empty\n",__func__);
        return;
    }

    /* get list size */
    int szlist = getszlist (*list);

    /* validate node to delete */
    if (nth >= szlist || nth < 0) {
        fprintf (stderr, "%s(), error: delete out of range (%d). allowed: (0 <= nth <= %d)\n", 
                __func__, nth, szlist-1);
        return;
    }

    /* create node pointers */
    tWorld *victim = *list;
    tWorld *prior = victim;

    /* if nth 0, prior is last, otherwise node before victim */
    if (nth == 0) {
        for (; prior->next != *list; prior = prior->next) ;
    } else {
        while (nth-- && victim-> next != *list) {
            prior = victim;
            victim = victim-> next;
        }
    }

    /* non-self-reference node, rewire next */
    if (victim != victim->next) {
        prior-> next = victim-> next;

        /* if deleting node 0, change list pointer address */
        if (victim == *list)
            *list = victim->next;
    } else {  /* if self-referenced, last node, delete list */
        *list = NULL;
    }

    free_node (victim);  /* free memory associated with node */
}

/* delete tWorld list */
void delete_list (tWorld *list)
{
    if (!list) return;

    tWorld *iter = list;  /* pointer to iterate list  */

    for (; iter; iter = (iter->next != list ? iter->next : NULL))
        if (iter) free_node (iter);
}

input test data file: 输入测试数据文件:

$ cat dat/struct.csv

worldId;worldName;message;constellationArray
1;K'tau;Planeta pod ochranou Freyra;Aquarius;Crater;Orion;Sagittarius;Cetus;Gemini;Earth
2;Martin's homeworld;Znicena;Aries;Sagittarius;Monoceros;Serpens;Caput;Scutum;Hydra;Earth
3;Martin's homeworld2;Znicena2;Aries2;Sagittarius2;Monoceros2;Serpens2;Caput2;Scutum2;Hydra2;Earth2
4;Martin's homeworld3;Znicena3;Aries3;Sagittarius3;Monoceros3;Serpens3;Caput3;Scutum3;Hydra3;Earth3

output: 输出:

$ ./bin/struct_ll_csv dat/struct.csv 1


 Read '4' records from file: dat/struct.csv

  1  K'tau                 Planeta pod ochranou Freyra
                                       Aquarius
                                       Crater
                                       Orion
                                       Sagittarius
                                       Cetus
                                       Gemini
                                       Earth
  2  Martin's homeworld    Znicena
                                       Aries
                                       Sagittarius
                                       Monoceros
                                       Serpens
                                       Caput
                                       Scutum
                                       Hydra
                                       Earth
  3  Martin's homeworld2   Znicena2
                                       Aries2
                                       Sagittarius2
                                       Monoceros2
                                       Serpens2
                                       Caput2
                                       Scutum2
                                       Hydra2
                                       Earth2
  4  Martin's homeworld3   Znicena3
                                       Aries3
                                       Sagittarius3
                                       Monoceros3
                                       Serpens3
                                       Caput3
                                       Scutum3
                                       Hydra3
                                       Earth3

 Deleting node: 1

  1  K'tau                 Planeta pod ochranou Freyra
                                       Aquarius
                                       Crater
                                       Orion
                                       Sagittarius
                                       Cetus
                                       Gemini
                                       Earth
  3  Martin's homeworld2   Znicena2
                                       Aries2
                                       Sagittarius2
                                       Monoceros2
                                       Serpens2
                                       Caput2
                                       Scutum2
                                       Hydra2
                                       Earth2
  4  Martin's homeworld3   Znicena3
                                       Aries3
                                       Sagittarius3
                                       Monoceros3
                                       Serpens3
                                       Caput3
                                       Scutum3
                                       Hydra3
                                       Earth3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM