逐行读取csv文件并输出第五列以在c中进行匹配

Question

I am new in C, and I'm trying to find a way to read csv file and output the fifth text in the line until eof 我是C语言的新手，我试图找到一种方法来读取csv文件并在该行中输出第五个文本，直到eof

My data looks like this: 我的数据如下所示：

05/02/2012 00:00:01.548,XOLT,1ZE86V280394811433,trackthepack,23.22.11.82,en_US, 05/02/2012 00:00:01.605,XOLT,1ZVzVrZVhOaGNtUnZi,hadees,50.16.47.103,en_US,VE 05/02/2012 00:00:01.647,XOLT,1ZbWhoY21GMGFHRnVY,hadees,50.19.203.230,en_US,VE 05/02/2012 00:00:02.275,XOLT,1Z4217060300279193,trackthepack,107.21.159.246,en_US, 05/02/2012 00:00:02.599,XOLT,1Z9X98040398954479,Cascademfg,66.117.15.81,en_US,NF 05/02/2012 00:00:02.639,XOLT,1Z3X252W0363295735,trackthepack,107.22.101.79,en_US, 05/02/2012 00：00：01.548，XOLT，1ZE86V280394811433，trackthepack，23.22.11.82，en_US，05/02/2012 00：00：01.605，XOLT，1ZVzVrZVhOaGNtUnZi，hadees，50.16.47.103，en_US，VE 05/02 / 2012 00：00：01.647，XOLT，1ZbWhoY21GMGFHRnVY，hadees，50.19.203.230，en_US，VE 05/02/2012 00：00：02.275，XOLT，1Z4217060300279193，trackthepack，107.21.159.246，en_US，05/02/2012 00 ：00：02.599，XOLT，1Z9X98040398954479，Cascademfg，66.117.15.81，en_US，NF 05/02/2012 00：00：02.639，XOLT，1Z3X252W0363295735，trackthepack，107.22.101.79，en_US，

I would need to read this file and store the value of the fifth text (eg 23.22.11.82) and use it further processing of a match. 我需要读取此文件并存储第五个文本的值（例如23.22.11.82），并使用它进一步处理匹配项。

In java, I use the following code to split out the csv line 在Java中，我使用以下代码来拆分csv行

String delims = "[,]"; 

while ((s1 = in.readLine()) != null && s1.length() != 0){

            String[] tokens = s1.split(delims);

Is there a similar way in C? C中有类似的方法吗？ My code works faster if I run it in C, that is the reason. 如果在C中运行，我的代码可以更快地工作，这就是原因。

I was able to try some c code and I was able to read the file (3 records) but it seems that it is not seeing the end of the line and I am hitting a segmentation error. 我能够尝试一些C代码，并且能够读取文件（3条记录），但似乎看不到行尾，并且遇到了分段错误。 I am using fgets and strtok 我正在使用fgets和strtok

THe input file is a variable length file delimiter by comma (,) and I want to get the fifth token in each line and then use it as a lookup key 输入文件是一个可变长度的文件定界符，用逗号（，）分隔，我想在每一行中获取第五个标记，然后将其用作查找关键字

here is the code : 这是代码：

    #include "GeoIP.h"
#include "GeoIPCity.h"


static const char * _mk_NA( const char * p ){
 return p ? p : "N/A";
}

int 
main(int argc, char *argv[])
{
  FILE           *f;
  FILE           *out_f;
  GeoIP          *gi;
  GeoIPRecord    *gir;
  int             generate = 0;
  char            iphost[50];
  char            *nextWordPtr = NULL;
  int             wordCount =0;
  char            *rechost;
  char            recbuffer[1000];
  char delims[]=",";
  const char     *time_zone = NULL;
  char          **ret;
  if (argc == 2)
    if (!strcmp(argv[1], "gen"))
      generate = 1;

  gi = GeoIP_open("../data/GeoIPCity.dat", GEOIP_MEMORY_CACHE);

  if (gi == NULL) {
    fprintf(stderr, "Error opening database\n");
    exit(1);
  }

  f = fopen("city_test.txt", "r");

  if (f == NULL) {
    fprintf(stderr, "Error opening city_test.txt\n");
    exit(1);
  }

  out_f = fopen("out_city_lookup_test.txt", "w");

  if (out_f == NULL) {
    fprintf(stderr, "Error opening out_city_lookup_test.txt\n");
    exit(1);
  }

//** Read the file line by line and get the ip address to use to lookup GeoIP **//
//*     while (!feof(f)) {
   while (fgets(recbuffer,1001,f) != NULL {
         nextWordPtr = strtok (recbuffer,delims); 
         while (nextWordPtr != NULL & wordCount < 5) {
           printf("word%d %s\n",wordCount,nextWordPtr);
           if (wordCount == 4 ) {
               printf("nextWordPtr %s\n",nextWordPtr);
               strcpy(iphost, nextWordPtr);
               printf("iphost %s\n",iphost);
           }    
           wordCount++;
           nextWordPtr = strtok(NULL,delims);
         }
    gir = GeoIP_record_by_name(gi, (const char *) iphost);

    if (gir != NULL) {
      ret = GeoIP_range_by_ip(gi, (const char *) iphost);
      time_zone = GeoIP_time_zone_by_country_and_region(gir->country_code, gir->region);
      printf("%s\t%s\t%s\t%s\t%s\t%s\t%f\t%f\t%d\t%d\t%s\t%s\t%s\n", iphost,
         _mk_NA(gir->country_code),
         _mk_NA(gir->region),
         _mk_NA(GeoIP_region_name_by_code(gir->country_code, gir->region)),
         _mk_NA(gir->city),
         _mk_NA(gir->postal_code),
         gir->latitude,
         gir->longitude,
         gir->metro_code,
         gir->area_code,
         _mk_NA(time_zone),
         ret[0],
         ret[1]);
      fprintf(out_f,"%s\t%s\t%s\t%s\t%s\t%s\t%f\t%f\t%d\t%d\t%s\t%s\t%s\n", iphost,
         _mk_NA(gir->country_code),
         _mk_NA(gir->region),
         _mk_NA(GeoIP_region_name_by_code(gir->country_code, gir->region)),
         _mk_NA(gir->city),
         _mk_NA(gir->postal_code),
         gir->latitude,
         gir->longitude,
         gir->metro_code,
         gir->area_code,
         _mk_NA(time_zone),
         ret[0],
         ret[1]);
      GeoIP_range_by_ip_delete(ret);
      GeoIPRecord_delete(gir);
    }
  }
  GeoIP_delete(gi);

  fclose(out_f);

  return 0;

Answer 1

是的，不是那么优雅，但是您可以使用strtok完成工作。

Answer 2

For what you want, a better approach is a lexer . 对于您想要的东西，更好的方法是词法分析器。 If your end goal is complex, you might want a parser as well. 如果最终目标很复杂，则可能还需要解析器。

I've got an example lexer and parser here . 我这里有一个示例词法分析器和解析器。 It is more complex than what you need though. 它比您所需要的还要复杂。 If you want something simple, strtok will do the job, but you will have several nasty surprises to watch out for. 如果您想要简单的东西， strtok可以完成任务，但是您将需要注意一些令人讨厌的惊喜。 It will also be difficult to use outside the simple case you have presented here. 在您这里介绍的简单案例之外，也将很难使用。

逐行读取csv文件并输出第五列以在c中进行匹配

问题描述

2 个解决方案

解决方案1
0 2013-03-19 18:36:39

解决方案2
0 2013-03-19 18:41:47

逐行读取csv文件并输出第五列以在c中进行匹配

问题描述

2 个解决方案

解决方案1 0 2013-03-19 18:36:39

解决方案2 0 2013-03-19 18:41:47

解决方案1
0 2013-03-19 18:36:39

解决方案2
0 2013-03-19 18:41:47