简体   繁体   中英

Struggling to read in a complex .csv file with C++

I'm trying to read in a.csv file and store it in a vector of structs. Currently my program works on a much smaller and simpler file but did not scale up. Currently my main problem is the error "error: no matching function for call to 'getline(std::string&, char)' 30 | getline(e.ea, ',');" even though I'm trying to pass in a string.

I've tried to put the input into a vector directly, instead of using getline, but it became pretty complicated quickly and I'm a total beginner.

This is my code:

#include <string>
#include <fstream>
#include <iomanip>
#include <vector>
#include <sstream>
using namespace std;

struct Entry {
    string eb, ed, ee, ef, eh, ei, ej, el, ek, em, en, er, es, et, eu, ev, ew, ex, ey, ez, ea, eg, ec, eo, ep, eq;
    
        friend ostream& operator<<(ostream& os, const Entry e);
        friend istream& operator>>(istream& is, Entry& e);


};

Entry parse_Line(ifstream &source);
bool read_File(const char*);
void write_File(vector <Entry>& data);

//overloading operator << and >> to be able to print out the information needed.
ostream& operator<<(ostream& os, const Entry e)
{
    os << "d: " << e.ed << " e: " << e.ee << " f: " << e.ef << " h: " << e.ei << " m: " << e.em << "\n";
    return os;
}

istream& operator>>(istream& is, Entry& e){
    getline(e.ea, ',');
    getline(is >> ws, e.eb, ',');
    getline(is >> ws, e.ec, ',');
    getline(is >> ws, e.ed, ',');
    getline(is >> ws, e.ee, ',');
    getline(is >> ws, e.ef, ',');
    getline(is >> ws, e.eg, ',');
    getline(is >> ws, e.eh, ',');
    getline(is >> ws, e.ei, ',');
    getline(is >> ws, e.ej, ',');
    getline(is >> ws, e.ek, ',');
    getline(is >> ws, e.el, ',');
    getline(is >> ws, e.em, ',');
    getline(is >> ws, e.en, ',');
    getline(is >> ws, e.eo, ',');
    getline(is >> ws, e.ep, ',');
    getline(is >> ws, e.eq, ',');
    getline(is >> ws, e.er, ',');
    getline(is >> ws, e.es, ',');
    getline(is >> ws, e.et, ',');
    getline(is >> ws, e.eu, ',');
    getline(is >> ws, e.ev, ',');
    getline(is >> ws, e.ew, ',');
    getline(is >> ws, e.ex, ',');
    getline(is >> ws, e.ey, ',');
    
    return(is >> e.ez);
} 


Entry parse_Line(ifstream& source){
    string eb, ed, ee, ef, eh, ei, ej, el, ek, em, en, er, es, et, eu, ev, ew, ex, ey, ez, ea, eg, ec, eo, ep, eq;
    Entry tempEntry;
    
    //scan a line from the file
    source >> ea >> eb >> ec >> ed >> ef >> eg >> eh >> ei >> ej >> ek >> el >> em >> en >> eo >> ep >> eq >> er >> es >> et >> eu >> ev >> ew >> ex >> ey >> ez;
    
    /*while(getline(str, word, ','))
        row.push_back(word);
        content.push_back(row);*/
    
    
    //assign data to tempEntry
    tempEntry.ea = ea;
    tempEntry.eb = eb;
    tempEntry.ec = ec;  
    tempEntry.ed = ed;
    tempEntry.ee = ee;
    tempEntry.ef = ef;
    tempEntry.eg = eg;
    tempEntry.eh = eh;
    tempEntry.ei = ei;
    tempEntry.ej = ej;
    tempEntry.ek = ek;
    tempEntry.el = el;
    tempEntry.em = em;
    tempEntry.en = en;
    tempEntry.eo = eo;
    tempEntry.ep = ep;
    tempEntry.eq = eq;
    tempEntry.er = er;
    tempEntry.es = es;
    tempEntry.et = et;
    tempEntry.eu = eu;
    tempEntry.ev = ev;
    tempEntry.ew = ew;
    tempEntry.ex = ex;
    tempEntry.ey = ey;
    tempEntry.ez = ez;
    return tempEntry;
} 

bool read_File(const char* fileName, vector <Entry>& allData){
//take in file name and name of struct created to store data.
    string line;
    
    ifstream fileInput;
    fileInput.open(fileName, ios::in);
    
    if (fileInput.is_open()){
        // take each line, put it into the parse_Line function, then put it into the allData vector.
        for (Entry e; fileInput >> e; allData.push_back(move(e)));
            
        fileInput.close();
        
         
        write_File(allData);
        return true;
    } else {
        return false;
    }
    
}

void write_File(vector <Entry>& data){
    //use vector passed in and print it to console for now. will change to printing a new csv file
    for (int i=0; i<=data.size(); i++ ){
        cout << data[i] << " ";
    }
    
    return;
}

int main (int argc, char* argv[]) {
    //check for file
    if (argc < 2){
        return(cout << "No file name specified\n"),1;
    }
    //read in file name to a function using following:
    string str(argv[1]);
    vector <Entry> data;
    
    if (!read_File(argv[1], data)){
        return(cout << "That file name is invalid\n"), 2;
    }
    
    const char* nameStr = str.c_str();
    read_File(nameStr, data);
    

    return 0;
} 

This is a simplified version of my input file (the real file will actually have paragraphs in each entry).

3902,string1,3,string two,string three,string 4,string five,230,string 6,string seven,string 8,string nine,stringten,string11,string12,string13,43,34,89,string 14,string 15,string 16,string 17,string eighteen,string nineteen,string twenty,string twenty one,string 22

92,b,324,c,d,e,f,g,h,i,j,k,l,m,n,43l,93403,392,r,s,t,u,v,w,x,y,z

Your error prevents compiling because there are only two getline() , and the first line of the operator>> does not match any of the two:

  • std::getline() , that is a free function requiring an isteam& as first argument;
  • std::istream::getline() , that is a member function of an istream and that can only read into array of characters of a known size.

So:

istream& operator>>(istream& is, Entry& e){
    getline(e.ea, ',');
    ...

could only be

    getline (is, e.ea, ',');   // or is>>ws

You approach is unfortunately flawed. The main issue here, is that is or is>>ws consume whitespaces, and these are not only ' ' but also newlines. This means that if the file has some missing fields, or some fields too much, you'd quickly end up reading the wrong information in the wrong line.

To make things even worse, getline(is,...,',') will only stop for a comma and will keep newlines in the string as if it was another character. Again, if the input file misses some fields, you might end reading the wrong information in the wrong line.

The csv file format is driven by line, so the best you should do is to implement an algorithm that will never miss a line break. The usual tric is to use getline() to read a full line, and then parse the fields in this string using istringstream . This way, if there's an error in the input file, you'll spot it easily and don't get caught in mismatches.

Now if you need full support for RFC 4180 compliant csv, it's even more complex: you would have to support quotes that can enclose a newline character that is then to be considered as a character of the quoted field. This would require more complex parsing, reading character by character and managing quote status to parse fields correctly and ignore comas and line feeds if enclosed in a quote.

I decided to use rapidCSV by d99kris, et. al. While I would love to have been able to post direct solution to my problem for future reference, there is no point reinventing the wheel when the rapidCSV single header will do what I need it to do.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM