简体   繁体   English

如何将CSV文件读入2d数组的struct,每个元素都有一个确定字符(C,G,M)来排序到struct中?

[英]How to read CSV file into 2d array of struct with each element having a determining character (C, G, M) to sort into the struct?

I am trying to create a 2d map of array using the csv input 我正在尝试使用csv输入创建一个2d数组映射

   5,4
   ,,,C 200
   ,G Vibranium Shield:hands:990,,C 50
   M Healing Potion:85,,M Defence Enchanment:360,
   ,,,
   ,,G Lighsaber:hands:850,5,4

The first row is the size of the array given. 第一行是给定数组的大小。

The problem I am having right now is how to still count the empty list in the csv as a row and column in the array such as ",,,". 我现在遇到的问题是如何仍然将csv中的空列表计为数组中的行和列,例如“,,,”。 Plus, how to read the determining character (C, G, M) in order to store the element in the struct. 另外,如何读取确定字符(C,G,M)以便将元素存储在结构中。 Example, G Vibranium Shield:hands:990, G will be the determining character stored in char type which then i use the switch case to store other element into the appropriate struct. 例如,G Vibranium Shield:hands:990,G将是存储在char类型中的确定字符,然后我使用switch case将其他元素存储到相应的struct中。

I tried to use fgets() strtok() but I can't read separately the determining element from other element in the CSV. 我尝试使用fgets()strtok(),但我无法单独读取CSV中其他元素的确定元素。 As from other example it seem it need prior knowledge into which element will be in the line and predetermine the read line and not based on the determining character in the CSV. 从其他示例来看,似乎需要事先知道哪个元素将在行中并预先确定读取行而不是基于CSV中的确定字符。 Thus I used fscanf to read: 因此我使用fscanf来阅读:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct map
{
char types;
char geartypes[100];
int coins;
int values;
char items[100];
}map;

struct map **m;

int main()
{
    FILE* mapz;
    int i,j,h;
    int width,height;
    char a;
    mapz=fopen("map.csv","r");

    if(mapz!=NULL)
    {
        fscanf(mapz,"%d,%d",&height,&width);
        map **m=(map **)malloc(height * sizeof(map *)); 
        for(i=0;i<height;i++)
        {
            m[i]=(map*)malloc(width * sizeof(map)); 
        }
        for(h=0;h<height;h++)
        {
            for(j=0;j<width;j++)
            {
                fscanf(mapz,"%c",&a);
                switch(a)
                {
                case('C'):
                    m[h][j].types=a;
                    fscanf(mapz,"%d",&m[h][j].coins);
                    break;
                case('G'):
                    m[h][j].types=a;
                    fscanf(mapz,"%[^,:]s",m[h][j].items);
                    fscanf(mapz,"%[^,:]s",m[h][j].geartypes);
                    fscanf(mapz,"%d",&m[h][j].values);
                    break;
                case('M'):
                    m[h][j].types=a;
                    fscanf(mapz,"%[^,:]s",m[h][j].items);
                    fscanf(mapz,"%d",&m[h][j].values);
                    break;
                }

            }
        }   
        for(h=0;h<height;h++)
        {
            for(j=0;j<width;j++)
            {
                switch(m[h][j].types)
                {
                case('C'):
                    printf("%c",m[h][j].types);
                    printf("%d\n",m[h][j].coins);
                    break;
                case('G'):
                    printf("%c",m[h][j].types);
                    printf("%s%s%d\n",m[h][j].items,m[h][j].geartypes,m[h][j].values);
                    break;
                case('M'):
                    printf("%c",m[h][j].types);
                    printf("%s%d\n",m[h][j].items,m[h][j].values);
                    break;
                }
            }
        }   
    }
    else
    {
        printf("No such file in directory");
    }
    fclose(mapz);
    return 0;

I tried to use fscanf but it seem to also read the "," which messed up the for count. 我试图使用fscanf,但它似乎也读了“,”这搞砸了计数。 When i ran the code it come out blank. 当我运行代码时,它变得空白。

Since you are stuck on handling empty fields when you are tokenizing each line, let's look at using strsep to handle that for you. 由于在标记每一行时你仍然处理空字段,让我们看一下使用strsep为你处理它。 There are a few caveats about using strsep . 关于使用strsep有一些注意事项。 First note the type of the first parameter. 首先注意第一个参数的类型。 It is char ** . 这是char ** That means you cannot read each line into a fixed character array and pass the address of a fixed array (it would not be char** , but instead char (*)[length] ). 这意味着你不能将每一行读入一个固定的字符数组并传递一个固定数组的地址(它不是char** ,而是char (*)[length] )。 Next, since strsep will update the pointer provided as the first parameter, you cannot simply give it the address of the allocated buffer you are using to store each line you read (you would lose the pointer to the start of the allocated block and be unable to free() the memory or read more than one line. 接下来,由于strsep将更新作为第一个参数提供的指针,因此您不能简单地为其提供用于存储您读取的每一行的已分配缓冲区的地址(您将丢失指向已分配块的开头的指针并且无法free()内存或读取多行。

So, bottom line, you need an allocated buffer to hold the text you are going to pass to strsep , and then your need 2 pointers , one to capture the return from strsep and one to pass the address of to strsep (to allow you to preserve your original buffer pointer). 所以,在底线,您需要一个分配的缓冲区来保存您要传递给strsep的文本,然后您需要2个指针 ,一个用于捕获从strsep返回的strsep ,另一个用于将地址传递给strsep (允许您保留原始缓冲区指针)。

With that in mind, you can parse your CSV with empty fields similar to: 考虑到这一点,您可以使用类似于以下的空字段解析CSV:

    while (fgets (buf, MAXC, fp)) { /* read each line in file */
        size_t i = 0;       /* counter */
        p = fields = buf;   /* initialize pointers to use with strsep */
        printf ("\nline %2zu:\n", n++ + 1);         /* output heading */
        while ((p = strsep (&fields, DELIM))) {     /* call strsep */
            p[strcspn(p, "\r\n")] = 0;              /* trim '\n' (last) */
            printf ("  field %2zu: '%s'\n", i++ + 1, p); /* output field */
        }
    }

Putting that together in a full example using your data, you can do something similar to: 使用您的数据将它们放在一个完整的示例中,您可以执行类似于以下操作:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXC  1024      /* if you need a constant, #define one (or more) */
#define DELIM ","       /* (numeric or string) */

int main (int argc, char **argv) {

    size_t n = 0, lines, nflds;
    char *buf, *fields, *p; /* must use 2 pointers for strsep */
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    if (!(buf = malloc (MAXC))) {   /* allocate storage for buffer */
        perror ("malloc-buf");      /* cannot be array with strsep */
        return 1;
    }

    if (!fgets (buf, MAXC, fp)) {   /* read/validate 1st line */
        fputs ("error: insufficient input line 1.\n", stderr);
        return 1;
    }   /* convert to lines and no. of fields (lines not needed) */
    if (sscanf (buf, "%zu,%zu", &lines, &nflds) != 2) {
        fputs ("error: invalid format line 1.\n", stderr);
        return 1;
    }

    while (fgets (buf, MAXC, fp)) { /* read each line in file */
        size_t i = 0;       /* counter */
        p = fields = buf;   /* initialize pointers to use with strsep */
        printf ("\nline %2zu:\n", n++ + 1);         /* output heading */
        while ((p = strsep (&fields, DELIM))) {     /* call strsep */
            p[strcspn(p, "\r\n")] = 0;              /* trim '\n' (last) */
            printf ("  field %2zu: '%s'\n", i++ + 1, p); /* output field */
        }
    }
    if (fp != stdin) fclose (fp);   /* close file if not stdin */
    free (buf);  /* free allocated memory */

    return 0;
}

Example Input File 示例输入文件

$ cat dat/emptyflds.csv
5,4
,,,C 200
,G Vibranium Shield:hands:990,,C 50
M Healing Potion:85,,M Defence Enchanment:360,
,,,
,,G Lighsaber:hands:850,5,4

Example Use/Output 示例使用/输出

The example simply prints the line number and then each separated field on a separate line below it so you can confirm the separation: 该示例只是打印行号,然后在其下方的单独行上打印每个单独的字段,以便您可以确认分隔:

$ ./bin/strcspnsepcsv <dat/emptyflds.csv

line  1:
  field  1: ''
  field  2: ''
  field  3: ''
  field  4: 'C 200'

line  2:
  field  1: ''
  field  2: 'G Vibranium Shield:hands:990'
  field  3: ''
  field  4: 'C 50'

line  3:
  field  1: 'M Healing Potion:85'
  field  2: ''
  field  3: 'M Defence Enchanment:360'
  field  4: ''

line  4:
  field  1: ''
  field  2: ''
  field  3: ''
  field  4: ''

line  5:
  field  1: ''
  field  2: ''
  field  3: 'G Lighsaber:hands:850'
  field  4: '5'
  field  5: '4'

( note: line 5 contains a 5th field that exceeds expected no. of fields) 注意:第5行包含超出预期的字段数的第5个字段)

To handle further separation within the fields on ':' or whatever else you need, you are free to call strtok on the pointer p within the field tokenization while loop. 要在':'或其他任何你需要的字段中处理字段内的进一步分离,你可以while循环时在字段标记化中的指针p上调用strtok

While I have no qualms with David C. Rankin 's answer, here's a different approach that uses regular expressions: 虽然我对David C. Rankin的回答毫无疑问,但这是一种使用正则表达式的不同方法:

#include <assert.h>
#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <sys/types.h>
#include <regex.h>

char line[4096];

int main( int argc, char *argv[] ) {
  if( !argv[1] )
    errx(EXIT_FAILURE, "missing input"); 

  FILE *input = fopen(argv[1], "r");
  if( !input  )
    err(EXIT_FAILURE, "could not open %s", argv[1]);

  if( NULL == fgets(line, sizeof(line), input) )
    err(EXIT_FAILURE, "could not read %s", argv[1]);

  int nr, nf, nfield;
  if( 2 != sscanf(line, "%d,%d", &nr, &nfield) ) 
    err(EXIT_FAILURE, "failed to parse first line");
  printf( "reading %d lines of %d fields each\n", nr, nfield );

  int erc;
  regex_t reg;
  const char fmt[] = "([^,\n]*)[,\n]";
  char *regex = calloc( nfield, 1 + strlen(fmt) );
  for( int i=0; i < nfield; i++ ) {
    strcat(regex, fmt);
  }

  int cflags = REG_EXTENDED;
  char errbuf[128];
  size_t len = sizeof(errbuf);
  const char *truncated = "";

  if( (erc = regcomp(&reg, regex, cflags)) != 0 ) {
    if( (len = regerror(erc, &reg, errbuf, len)) > sizeof(errbuf) ) 
      truncated = "(truncated)";
    errx(EXIT_FAILURE, "%s %s", errbuf, truncated);
  }

  for( int i=0; i < nr && NULL != fgets(line, sizeof(line), input); i++ ) {
    regmatch_t matches[1 + nfield];
    const int eflags = 0;

    printf("%s", line);

    if( (erc = regexec(&reg, line, 1 + nfield, matches, eflags)) != 0 ) {
      if( (len = regerror(erc, &reg, errbuf, len)) > sizeof(errbuf) ) 
        truncated = "(truncated)";
      errx(EXIT_FAILURE, "regex error: %s %s", errbuf, truncated);
    }

    for( nf=1; nf < nfield + 1 && matches[nf].rm_so != -1; nf++ ) {
      assert(matches[nf].rm_so <= matches[nf].rm_eo);
      printf( "%4d: '%.*s'\n",
          nf,
          (int)(matches[nf].rm_eo - matches[nf].rm_so),
          line + matches[nf].rm_so );
    }
  }

  return EXIT_SUCCESS;
}  

It's only a little longer (mostly to handle errors). 它只是一点点(主要是为了处理错误)。 What I like is that once regexec (3) is called, the fields are all set up in the matches array. 我喜欢的是,一旦调用regexec (3),字段都在matches数组中设置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM