简体   繁体   English

使用strtok读取csv文件

[英]Use strtok read csv file

I am trying to use strtok in C to read csv file, and store the contents into array of struct Game. 我试图在C中使用strtok读取csv文件,并将内容存储到struct Game数组中。 My code is shown below: 我的代码如下所示:

  FILE *fp;
  int i = 0;
  if((fp=fopen("Games.csv","r"))==NULL)
    {
      printf("Can't open file.\n");
      exit(1);
    }
  rewind(fp);
  char buff[1024]; 
  fgets(buff,1024,fp);
  char* delimiter = ",";

  while(fgets(buff, 1024, (FILE*)fp)!=NULL && i<5){

    Game[i].ProductID= strtok(buff, ",");   


    Game[i].ProductName = strtok(NULL, delimiter);

        Game[i].Publisher = strtok(NULL, delimiter);

    Game[i].Genre = strtok(NULL, delimiter);

    Game[i].Taxable = atoi(strtok(NULL, delimiter));

    Game[i].price = strtok(NULL, delimiter);

    Game[i].Quantity  = atoi(strtok(NULL, delimiter));


       printf("%s\n", Game[i].ProductID);

    i++;
   }


    i = 0;
    for(i = 0; i<5; i++){
       printf("%s", Game[i].ProductID);
    }

The output is shown below: 输出如下所示:

DS_25ROGVOIRY
DS_25MMD4N2BL
DS_258KADVNLH
DS_25UR7M375D
DS_25FP45CJFZ
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2

The first five lines (in the while loop) are correct. 前五行(在while循环中)是正确的。 However, the last five lines( outside of while loop) are wrong, it print the whole line content. 但是,最后五行(在while循环之外)是错误的,它将打印整个行的内容。

I am so confused about it. 我对此很困惑。 When the array is changed and how to still print the correct answer after while loop. 更改数组以及在while循环后如何仍然打印正确答案时。

First, a primer on how strtok() works. 首先,介绍strtok()工作原理。 The function will give you back a pointer to somewhere in the original string, said string having been modified to make it look like you only have a single token (a) . 该函数将为您提供指向原始字符串中某处的指针,该字符串已被修改以使其看起来像您只有一个标记(a)

For example, the first strtok of "A,B,C" would turn it into "A\\0B,C" and give you back the address of the A character. 例如, "A,B,C"的第一个strtok会将其转换为"A\\0B,C"并给您返回A字符的地址。 Using it at that point would then give you "A" . 此时使用它会给您"A"

Similarly, the second call would turn it into "A\\0B\\0C" and give you back the address of the B character. 同样,第二个调用会将其转换为"A\\0B\\0C"并给您B字符的地址。

The fact that it's giving you pointers into the original string is paramount here because that original string is located in buff . 它为您提供指向原始字符串的指针的事实在这里至关重要,因为原始字符串位于buff

And, you're actually overwriting buff every time you read a line from the file. 而且,实际上,每次您从文件中读取一行时,您实际上都会覆盖 buff So, for all those five lines, Game[i].ProductID will simply be the address of the first character of buff . 因此,对于所有这五行, Game[i].ProductID将只是buff的第一个字符的地址。 After you have processed the fifth line, the line: 处理完第五行后,该行:

while (fgets(buff, 1024, fp) != NULL && i < 5)

will first read in the sixth line before exiting the loop. 退出循环之前将先读入第六行。

This is why the final lines you see are actually not the same as any of the first five. 这就是为什么您看到的最后几行实际上与前五行都不相同的原因。 You're printing out all the C strings for ProductID , at the (identical) addresses of buff , so you only see the sixth one, and you see the full line because you didn't tokenise that one after reading it in. 您将在buff的(相同)地址上打印出ProductID所有C字符串,因此您只会看到第六行,并且看到了行,因为在读入该行之后并没有标记该行。

What you need to do is to make a copy of the tokens before overwriting the line. 您需要做的是在覆盖该行之前复制令牌。 That can be done with something like (it's a little complex but correctly handles the case where strtok returns NULL): 可以用类似的方法完成(这有点复杂,但是可以正确处理strtok返回NULL的情况):

if ((Game[i].ProductID = strtok(buff, ",")) != NULL)
    Game[i].ProductID = strdup(Game[i].ProductID);

remembering that you should free those memory allocations at some point. 请记住,您应该在某个时候free那些内存分配。

In the incredibly unlikely event your environment doesn't have a strdup (it's POSIX rather than ISO), see here . 令人难以置信万一您的环境不具备strdup (它的POSIX,而不是ISO),见这里


And, just as an aside, most CSV implementations allow for embedded commas such as by enclosing them in quotes or escaping them (the latter is rare but I have seen them): 而且,正如顺便说一句,最CSV实现允许嵌入的逗号如括起来引号或转义它们(后者是罕见的,但我看到他们):

name,"diablo, pax",awesome
name,diablo\, pax,awesome

Both of those may be expected to be three fields, name , diablo, pax and awesome . 可以预期这两个都是三个字段,即namediablo, paxawesome

Simplified processing with strtok will not allow for such complexities but, assuming your fields do not contain embedded commas, it may be okay. 使用strtok简化处理不会带来这种复杂性,但是,假设您的字段不包含嵌入式逗号,那就可以了。 If your input is more complex, you may be better off using a third-party CSV library (with a suitable licence of course). 如果您输入的内容比较复杂,则最好使用第三方CSV库(当然要有合适的许可证)。


(a) For the language lawyers among us, this is covered in the ISO C standard, C11 7.24.5.8 The strtok function, /3 and /4 (my bold): (a)对于我们中间的语言律师,这已包含在ISO C标准C11 7.24.5.8 The strtok function, /3 and /4 (我的粗体):

3/ The first call in the sequence searches the string pointed to by s1 for the first character that is not contained in the current separator string pointed to by s2 . 3 /序列中的第一个调用在s1指向的字符串中搜索s2指向的当前分隔符字符串中包含的第一个字符。 If no such character is found, then there are no tokens in the string pointed to by s1 and the strtok function returns a null pointer. 如果找不到这样的字符,则s1指向的字符串中没有标记,并且strtok函数返回空指针。 If such a character is found, it is the start of the first token . 如果找到了这样的字符, 则它是第一个标记的开始

4/ The strtok function then searches from there for a character that is contained in the current separator string. 4 /所述strtok功能然后从那里搜索包含在当前字符串分隔符的字符。 If no such character is found, the current token extends to the end of the string pointed to by s1 , and subsequent searches for a token will return a null pointer. 如果找不到这样的字符,则当前标记扩展到s1指向的字符串的末尾,并且随后对标记的搜索将返回空指针。 If such a character is found, it is overwritten by a null character, which terminates the current token. 如果找到这样的字符, 它将被一个空字符覆盖,这将终止当前令牌。 The strtok function saves a pointer to the following character, from which the next search for a token will start. strtok函数保存一个指向以下字符的指针,从该字符开始下一次对令牌的搜索。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM