简体   繁体   中英

read csv file line by line and output the fifth column to use for matching in c

I am new in C, and I'm trying to find a way to read csv file and output the fifth text in the line until eof

My data looks like this:

05/02/2012 00:00:01.548,XOLT,1ZE86V280394811433,trackthepack,23.22.11.82,en_US, 05/02/2012 00:00:01.605,XOLT,1ZVzVrZVhOaGNtUnZi,hadees,50.16.47.103,en_US,VE 05/02/2012 00:00:01.647,XOLT,1ZbWhoY21GMGFHRnVY,hadees,50.19.203.230,en_US,VE 05/02/2012 00:00:02.275,XOLT,1Z4217060300279193,trackthepack,107.21.159.246,en_US, 05/02/2012 00:00:02.599,XOLT,1Z9X98040398954479,Cascademfg,66.117.15.81,en_US,NF 05/02/2012 00:00:02.639,XOLT,1Z3X252W0363295735,trackthepack,107.22.101.79,en_US,

I would need to read this file and store the value of the fifth text (eg 23.22.11.82) and use it further processing of a match.

In java, I use the following code to split out the csv line

String delims = "[,]"; 

while ((s1 = in.readLine()) != null && s1.length() != 0){

            String[] tokens = s1.split(delims); 

Is there a similar way in C? My code works faster if I run it in C, that is the reason.

I was able to try some c code and I was able to read the file (3 records) but it seems that it is not seeing the end of the line and I am hitting a segmentation error. I am using fgets and strtok

THe input file is a variable length file delimiter by comma (,) and I want to get the fifth token in each line and then use it as a lookup key

here is the code :

    #include "GeoIP.h"
#include "GeoIPCity.h"


static const char * _mk_NA( const char * p ){
 return p ? p : "N/A";
}

int 
main(int argc, char *argv[])
{
  FILE           *f;
  FILE           *out_f;
  GeoIP          *gi;
  GeoIPRecord    *gir;
  int             generate = 0;
  char            iphost[50];
  char            *nextWordPtr = NULL;
  int             wordCount =0;
  char            *rechost;
  char            recbuffer[1000];
  char delims[]=",";
  const char     *time_zone = NULL;
  char          **ret;
  if (argc == 2)
    if (!strcmp(argv[1], "gen"))
      generate = 1;

  gi = GeoIP_open("../data/GeoIPCity.dat", GEOIP_MEMORY_CACHE);

  if (gi == NULL) {
    fprintf(stderr, "Error opening database\n");
    exit(1);
  }

  f = fopen("city_test.txt", "r");

  if (f == NULL) {
    fprintf(stderr, "Error opening city_test.txt\n");
    exit(1);
  }

  out_f = fopen("out_city_lookup_test.txt", "w");

  if (out_f == NULL) {
    fprintf(stderr, "Error opening out_city_lookup_test.txt\n");
    exit(1);
  }

//** Read the file line by line and get the ip address to use to lookup GeoIP **//
//*     while (!feof(f)) {
   while (fgets(recbuffer,1001,f) != NULL {
         nextWordPtr = strtok (recbuffer,delims); 
         while (nextWordPtr != NULL & wordCount < 5) {
           printf("word%d %s\n",wordCount,nextWordPtr);
           if (wordCount == 4 ) {
               printf("nextWordPtr %s\n",nextWordPtr);
               strcpy(iphost, nextWordPtr);
               printf("iphost %s\n",iphost);
           }    
           wordCount++;
           nextWordPtr = strtok(NULL,delims);
         }
    gir = GeoIP_record_by_name(gi, (const char *) iphost);

    if (gir != NULL) {
      ret = GeoIP_range_by_ip(gi, (const char *) iphost);
      time_zone = GeoIP_time_zone_by_country_and_region(gir->country_code, gir->region);
      printf("%s\t%s\t%s\t%s\t%s\t%s\t%f\t%f\t%d\t%d\t%s\t%s\t%s\n", iphost,
         _mk_NA(gir->country_code),
         _mk_NA(gir->region),
         _mk_NA(GeoIP_region_name_by_code(gir->country_code, gir->region)),
         _mk_NA(gir->city),
         _mk_NA(gir->postal_code),
         gir->latitude,
         gir->longitude,
         gir->metro_code,
         gir->area_code,
         _mk_NA(time_zone),
         ret[0],
         ret[1]);
      fprintf(out_f,"%s\t%s\t%s\t%s\t%s\t%s\t%f\t%f\t%d\t%d\t%s\t%s\t%s\n", iphost,
         _mk_NA(gir->country_code),
         _mk_NA(gir->region),
         _mk_NA(GeoIP_region_name_by_code(gir->country_code, gir->region)),
         _mk_NA(gir->city),
         _mk_NA(gir->postal_code),
         gir->latitude,
         gir->longitude,
         gir->metro_code,
         gir->area_code,
         _mk_NA(time_zone),
         ret[0],
         ret[1]);
      GeoIP_range_by_ip_delete(ret);
      GeoIPRecord_delete(gir);
    }
  }
  GeoIP_delete(gi);

  fclose(out_f);

  return 0;

是的,不是那么优雅,但是您可以使用strtok完成工作。

For what you want, a better approach is a lexer . If your end goal is complex, you might want a parser as well.

I've got an example lexer and parser here . It is more complex than what you need though. If you want something simple, strtok will do the job, but you will have several nasty surprises to watch out for. It will also be difficult to use outside the simple case you have presented here.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM