简体   繁体   English

C ++:使用fgetc读取csv文件,并在分号“;”上使用单独的单词

[英]C++ : read csv file with fgetc and separate words on semicolon “;”

I have to read in a csv file with 5 fields (int , char[], char[], char[], float) that looks like that : 我必须读取一个csv文件,其中包含以下5个字段(int,char [],char [],char [],float):

2345678;Meier;Hans;12.10.1985;2.4;      
1234567;Müller;Fritz;17.05.1990;1.9;

I have to put the fields in a struct, and then put the struct after one line is complete, into a array of the struct type ... 我必须将字段放入结构中,然后在一行完成后将结构放入结构类型的数组中。

for the learning effect, we are only allowed to use LOW-LEVEL coding, and only use functions like fgetc, strcpy and no strings, only char[]... Now I made my algorithm to read the textfile character by character, but I have problems separating them correctly, putting them together again and assigning them to the struct fields correctly. 为了获得学习效果,我们只允许使用LOW-LEVEL编码,并且仅使用fgetc,strcpy和没有字符串之类的函数,仅使用char [] ...现在,我使我的算法能够逐字符读取文本文件,但是我在正确分离它们,将它们重新组合在一起并将它们正确分配给struct字段时遇到问题。 Here is my Code: 这是我的代码:

  #include <cstdlib>
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <string.h>

using namespace std;

int main(int argc, char **argv)
{
    struct Stud{
        long matrnr;
        char vorname[30];
        char name[30];
        char datum[30];
        float note;
    };

    const int MAX = 30;
    Stud stud;  
    Stud mystud[30]; // <<-- Array of "Stud" type
    //memset((void*)mystud,0,sizeof(mystud) * sizeof(Stud));
    int wordCounter(0);
    int i(0); //thats the charCounter or index
    int studentCounter(0);
    char wort[MAX];
    //int matrnr;
    //char vorname[MAX];
    //char name[MAX];
    //char datum[MAX];
    //float note;


  FILE * pFile;
  int cnr(0); 


  pFile=fopen("studentendaten.txt","r");  
  if (pFile==nullptr) 
  {
      perror ("Fehler beim öffnen der Datei");
  }

  else
  {       
    while (cnr != EOF) 
    {       
        (cnr=fgetc(pFile)) ;


        if ((char)cnr == '\n') {
            mystud[studentCounter] = stud;
            studentCounter++;                       
            continue;           
        }

        if ((char)cnr == ';') { 

            wort[i] = '\0'; 

            switch (wordCounter % 5) {

                case 0:             
                stud.matrnr = atol(wort);
                break;

                case 1:
                strcpy(stud.name, wort);
                break;

                case 2:
                strcpy(stud.vorname, wort);
                break;

                case 3:
                strcpy(stud.datum,wort);
                break;

                case 4:
                stud.note = atof(wort); 
                break;
            }       

            wordCounter++;          
            i = 0;
            continue;
        }

        if (wordCounter %  5 == 0 && (char)cnr != ';') {        
        wort[i] = (char)cnr;
        i++;
        //stud.matrnr = atol(wort);
        }           

        if (wordCounter % 5 == 1) {
            wort[i] =  (char)cnr;
            i++;
        //strcpy(stud.name, wort);
        }

        if (wordCounter % 5 == 2) {
            wort[i] = (char)cnr;
            i++;
            //strcpy(stud.vorname, wort);
        }

        if (wordCounter % 5 == 3) {
            wort[i] = (char)cnr;
            i++;
            //strcpy(stud.datum,wort);
        }

        if (wordCounter % 5 == 4) {
            wort[i] = (char)cnr;
            i++;
            //stud.note = atof(wort);                       
        }

    }   


    fclose (pFile);
}
for (int i(0) ; i <= studentCounter; i++) {
cout <<mystud[i].matrnr << "    " << mystud[i].name << "    " << mystud[i].vorname <<"    " 
<< mystud[i].datum <<"    " << mystud[i].note << endl;
  //printf("%5ld        %5s      %5s     %5s     %5f     \n",mystud[i].matrnr,mystud[i].name,mystud[i].vorname,mystud[i].datum,mystud[i].note);

}

    return 0;
}

I am not sure if it has to do with a wrong increment variables, or the fact that I don't put an '\\0' at the end of my wort[] array..and therefore not recognizing the end of my array? 我不确定这是否与错误的增量变量有关,或者我没有在wort []数组的末尾放置'\\ 0'..因此无法识别数组的末尾吗? And if so, how do I do it without knowing where the end exactly is... ? 如果是这样,我该怎么做而又不知道结尾到底在哪里? (I don't know the length of the words..) (我不知道单词的长度。)

EDIT: I updated my code again, the only thing that wonders me is that the LAST LINE IS NOT BEING CORRECTLY PARSED , its showing some rubbish, and I can't see the error in my code... 编辑:我再次更新了我的代码,唯一令我惊讶的是,最后一行没有正确粘贴,它显示了一些垃圾,并且我在代码中看不到错误...

2345678;Meier;Hans;12.10.1985;2.4;      
1234567;Müller;Fritz;17.05.1990;1.9;
8392019;Thomas;Kretschmer;28.3.1920;2.5;
3471144;Mensch;Arbeit;29.2.2013;4.5;
2039482;Test;Test;30.20.2031;2.0;
7584932;Bau;Maschine;02.02.2010;2.3;
2345678;Meier;Hans;12.10.1985;2.4;      
1234567;Müller;Fritz;17.05.1990;1.9;
8392019;Thomas;Kretschmer;28.3.1920;2.5;
3471144;Mensch;Arbeit;29.2.2013;4.5;
2039482;Test;Test;30.20.2031;2.0;
7584932;Bau;Maschine;02.02.2010;2.3;
2345678;Meier;Hans;12.10.1985;2.4;      
1234567;Müller;Fritz;17.05.1990;1.9;
8392019;Thomas;Kretschmer;28.3.1920;2.5;
3471144;Mensch;Arbeit;29.2.2013;4.5;
2039482;Test;Test;30.20.2031;2.0;
7584932;Bau;Maschine;02.02.2010;2.3;
2345678;Meier;Hans;12.10.1985;2.4;      
1234567;Müller;Fritz;17.05.1990;1.9;
8392019;Thomas;Kretschmer;28.3.1920;2.5;
3471144;Mensch;Arbeit;29.2.2013;4.5;
2039482;Test;Test;30.20.2031;2.0;
7584932;Bau;Maschine;02.02.2010;2.3;

Suggestion: use a case structure for the parsing, and make yourself a "copyToSemicolon" function: then you can write things like 建议:使用case结构进行解析,然后将自己设为“ copyToSemicolon”函数:然后您可以编写如下内容

sIndexCount = 0;
char temp[50];
while((cnr=fgetc(pFile)) != EOF) {
  offset = 0;
  for(var = 0; var < 5; var++ {
    switch(var) {
    case 0:
      offset = copyToSemicolon(temp, cnr, offset) + 1;
      stud.matrnr = atoi(temp);
      break;
    case 1:
      offset = copyToSemicolon(mystud[sIndexCount].vorname, cnr, offset) + 1;
      break;
    ... etc
    }
  }
  sIndexCount++;
  if(sIndexCount == 50) break;  // in case the input file is longer than our structure
}

And you need a function copyToSemicolon that takes two char* pointers as inputs, and that copies characters from the second string (starting at offset ) until it reaches either a semicolon or the end of line - and that returns the offset it reached (last character read). 并且您需要一个函数copyToSemicolon ,该函数将两个char*指针作为输入,并从第二个字符串(从offset开始)复制字符,直到到达分号或行尾-并返回到达的偏移(最后一个字符)读)。

int copyToSemicolon(char* dest, char* source, int offset) {
  while(source[offset] != ';' && source[offset] != '\n') {
    *dest = source[offset++];
    dest++;
  }
  return offset;
} 

EDIT strtok method: 编辑 strtok方法:

sIndexCount = 0;
char temp[50];
while((cnr=fgetc(pFile)) != EOF) {
  offset = 0;
  temp = strtok(cnr, ';');
  for(var = 0; var < 5; var++ {
    switch(var) {
    case 0:
      stud.matrnr = atoi(temp);
      break;
    case 1:
      strcpy(mystud[sIndexCount].vorname, strtok(NULL, ';'));
      break;
    ... etc
    case 4:
      mystud[sIndexCount].note = atof(strtok(NULL, '\n'));
    }
  }
  sIndexCount++;
  if(sIndexCount == 50) break;  // in case the input file is longer than our structure
}

One issue that I am seeing is that your code copies or parses one character at a time, such that when you're reading 2345678;Meier;Hans;12.10.1985;2.4; 我看到的一个问题是您的代码一次复制或解析一个字符,这样当您阅读2345678;Meier;Hans;12.10.1985;2.4; you first set stud.matrnr to 2, then 23, then 234, then 2345, then 23456, then 234567, then 2345678. Similarly, for stud.name , you first set it to M, then the Me, then to Mei, etc. I propose to you to think of things in a different way. 首先将stud.matrnr设置为2,然后是23,然后是234,然后是2345,然后是23456,然后是234567,然后是2345678。类似地,对于stud.name ,您首先将其设置为M,然后是Me,然后是Mei,等等。我建议您以不同的方式思考问题。 I'll give you some pseudocode: 我给你一些伪代码:

while (!eof) {
    get character from file
    if (character isn't ';' and isn't '\n') {
        copy character into buffer (increment buffer index)
    } else if (character is ';') {
        it's the end of a word.  Put it in its place - turn it to an int, copy it, whatever
        reset the buffer
    } else if (character is '\n') {
        it's the end of the last word, and the end of the line.  Handle the last word
        reset the buffer
        copy the structure
    }
}

This should make life a lot easier on you. 这应该使您的生活更加轻松。 You're not changing your data nearly as much, and if you need to debug, you can focus on each part on its own. 您并没有改变太多的数据,并且如果您需要调试,则可以专注于每个部分。

Generally, in programming, the first step is making sure you can say in your native speaking language what you want to do, then it's easier to translate it to code. 通常,在编程中,第一步是确保您可以用母语说出您想做什么,然后将其转换为代码会更容易。 You're close with you implementation, and you can make it work. 您与实现紧密相关,可以使其正常运行。 Just be sure you can explain what should be happening when you see ';' 只要确保您能解释当看到';'时会发生什么。 or '\\n'. 或“ \\ n”。

Since you have tagged this as C++, you should consider using std::getline for reading the line from the file, the use std::getline(file, text_before_semicolon, ';') for parsing the fields. 由于已将其标记为C ++,因此应考虑使用std::getline从文件中读取行,使用std::getline(file, text_before_semicolon, ';')来解析字段。

You could also use std::istringstream for converting the textual representation in the text line to internal numeric format. 您还可以使用std::istringstream将文本行中的文本表示形式转换为内部数字格式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM