简体   繁体   中英

Having trouble using getline to read from a .csv file

A part of my program consists of me attempting to read in lines from a .csv file and store parts of it into a struct. When I attempted to execute my code shown below, however, I was told in the console that there was an invalid instance of stod . With this information, I went into the debugger and found out that nothing was being read from the file into the dummy variables I created ( part1 - part4 ), and that they simply had values of "" still.

An example line from the .csv I am reading would be:

"Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,10;000+,Free,0,Everyone,Art & Design,January 7; 2018,1.0.0,4.0.3 and up"

where I would only want up until it says "159" (aka the number of reviews).

#include <iostream>
#include <cmath>
#include <iomanip>
#include <fstream>
#include <sstream>
#include <string>

using namespace std;

struct App {
    std::string name;
    std::string category;
    double rating;
    int reviewNum;
};

void readData(App appList[], const int size) {

    ifstream inFile;
    inFile.open("googleplaystore.csv");

    if(inFile.fail()) {
        cout << "File open error!";
        exit(0);  //Close program
    }

    for(int appIndex = 0; appIndex < size; appIndex++) {

        string part1, part2, part3, part4, trash = "";
        //Line1
        getline(inFile, part1, ',');    //read until ,
        getline(inFile, part2, ',');
        getline(inFile, part3, ',');
        getline(inFile, part4 , ',');
        getline(inFile, trash);         //read until end of line

        appList[appIndex].name = part1;
        appList[appIndex].category = part2;
        appList[appIndex].rating = stod(part3);
        appList[appIndex].reviewNum = stoi(part4);

        if(inFile.fail()) {
            cout << "File read error!";
            exit(0);  //Close program
        }

}

int main()
{
    cout << fixed << setprecision(1);

    App appList[NUM_RECORDS] = {};

    readData(appList, NUM_RECORDS);
}

Your program does incorrect error checking. It tries to read past the end of the file, fails, ignores the error, and then tries to convert the non-existing input string to double . This fails wth an exception.

It is too late to check the file after the program tries to do all this.

Check every IO operation for success, immediately after it is done.

One popular way of doing this is to read the input line by line, and tgen parse each line separately, eg like in the below snippet.

while (std::getline(infile, instring)) {
   std::istringstream linestream (instring);
   // read the linestream
}

In addition to the problems pointed out by @n. 'pronouns' m. with your failure to validate the return of getline , you will quickly run into limitations of attempting to parse a .csv file by successive calls to getline using a ',' as a delimiter (or any other non-newline delimiter). When you specify the delimiter other than '\\n' , getline will ignore the '\\n' looking for the next ',' when attempting to read all fields.

Instead, it is common to create a stringstream from the line read from the .csv and then parse what you need from the stringstream using getline and a delimiter. Why? You only have a line of data in the stringstream so getline must stop after reading the last field -- because the stringstream will be empty... In your case since you are not reading all fields, you can utilize a field counter and a temporary instance of App to fill with data. You can simply switch(fieldcount) {...} to separate the data read from the stringstream into the correct member variable.

You are already using std::string , you may as well #include <vector> and use a std::vector<App> for your storage rather an a plain old array. You can simply have std::vector<App> as your return type for your read function and fill the vector as you read the .csv file, and then return the vector for use elsewhere in your program.

Putting the pieces together, you could define your read function as follows:

/* function reading csv and returning std::vector<App> containing
 * first 4 fields from each line of input .csv
 */
std::vector<App> read_csv (const std::string& name)
{
    std::vector<App> v {};          /* vector of App */
    std::string line;               /* string to hold each line */
    std::ifstream fin (name);       /* input file stream */
    ...

Now simply declare a temporary instance of App , create a std::stringstream from line , a std:string to hold each field and loop getting each field from your stringstream, eg

    while (std::getline (fin, line)) {      /* read entire line into line */
        App tmp;                            /* temporary instance to fill */
        size_t nfield = 0;                  /* field counter for switch */
        std::string field;                  /* string to hold each field */
        std::stringstream ss (line);        /* create stringstream from line */
        while (getline (ss, field, ',')) {  /* read each field from line */
            ...

Now that you have your field, simply switch on your field counter to assign it to the correct member variable, and after filling the 4 th field, add your temporary instance of App to your vector, eg

            switch (nfield) {                           /* switch on nfield */
                case 0: tmp.name = field; break;        /* fill name */
                case 1: tmp.category = field; break;    /* fill category */
                case 2: try {               /* convert field to double */
                        tmp.rating = stod (field);
                    }
                    catch (const std::exception & e) {
                        std::cerr << "error invalid tmp.rating: " << 
                                    e.what() << '\n';
                        goto nextline;
                    }
                    break;
                case 3: try {               /* convert field to int */
                        tmp.reviewNum = stoi (field);
                        v.push_back(tmp);
                    }
                    catch (const std::exception & e) {
                        std::cerr << "error invalid tmp.reviewNum: " << 
                                    e.what() << '\n';
                    }
                    goto nextline;   /* all done with fields, get next line */
                    break;
            }
            ...

All that remains in your read loop is updating your field counter nfield and providing a label to break the nested loop and switch , eg

            nfield++;   /* increment field counter */
        }
        nextline:;      /* label for nextline */
    }

After all lines are read from the file, simply return your vector:

    return v;   /* return filled vector of App */
}

You would call your read function from main() similar to:

     /* fill vector of App from csv */
    std::vector<App> appdata = read_csv (argv[1]);

Putting it altogether in a short example, you could do the following to read all wanted information from your googleplay .csv into a vector of App ,

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>

struct App {
    std::string name;
    std::string category;
    double rating;
    int reviewNum;
};

/* function reading csv and returning std::vector<App> containing
 * first 4 fields from each line of input .csv
 */
std::vector<App> read_csv (const std::string& name)
{
    std::vector<App> v {};          /* vector of App */
    std::string line;               /* string to hold each line */
    std::ifstream fin (name);       /* input file stream */

    while (std::getline (fin, line)) {      /* read entire line into line */
        App tmp;                            /* temporary instance to fill */
        size_t nfield = 0;                  /* field counter for switch */
        std::string field;                  /* string to hold each field */
        std::stringstream ss (line);        /* create stringstream from line */
        while (getline (ss, field, ',')) {  /* read each field from line */
            switch (nfield) {                           /* switch on nfield */
                case 0: tmp.name = field; break;        /* fill name */
                case 1: tmp.category = field; break;    /* fill category */
                case 2: try {               /* convert field to double */
                        tmp.rating = stod (field);
                    }
                    catch (const std::exception & e) {
                        std::cerr << "error invalid tmp.rating: " << 
                                    e.what() << '\n';
                        goto nextline;
                    }
                    break;
                case 3: try {               /* convert field to int */
                        tmp.reviewNum = stoi (field);
                        v.push_back(tmp);
                    }
                    catch (const std::exception & e) {
                        std::cerr << "error invalid tmp.reviewNum: " << 
                                    e.what() << '\n';
                    }
                    goto nextline;   /* all done with fields, get next line */
                    break;
            }
            nfield++;   /* increment field counter */
        }
        nextline:;      /* label for nextline */
    }

    return v;   /* return filled vector of App */
}

int main (int argc, char **argv) {

    if (argc < 2) {
        std::cerr << "error: insufficient input\n" <<
                     "usage: " << argv[0] << " <file>\n";
        return 1;
    }

    /* fill vector of App from csv */
    std::vector<App> appdata = read_csv (argv[1]);

    for (auto& v : appdata)     /* output results */
        std::cout << "\nname     : " << v.name
                  << "\ncategory : " << v.category
                  << "\nrating   : " << v.rating
                  << "\nreviewNum: " << v.reviewNum << '\n';
}

Example Use/Output

Using the one-line of input provided in the file dat/googleplay.csv , you would receive the following output from the program:

$ ./bin/read_googlecsv dat/googleplay.csv

name     : Photo Editor & Candy Camera & Grid & ScrapBook
category : ART_AND_DESIGN
rating   : 4.1
reviewNum: 159

Reading each line and using a std::stringstream to parse your fields from the comma separated values solves a number of problems you will face if you need to utilize all fields. It also allows consuming an entire line of input on each call to getline preventing a partial read of a line in the event of a formatting problem with the file. Look things over and consider the advantages of using a vector of App rather than a plain old array. Let me know if you have further questions.

Although an answer is already accepted, I would like to show a more "modern" C++ approach.

I would be happy, if you could study this solution and try to use some features in the future.

In the object orient world, we use classes (or structs) and put data and functions, operating on this data, in one (encapsulated) object.

Only the class should know, how to read and write its data. Not some outside global functions. Therefor I added 2 member functions to your struct. I have overwritten the inserter and the extractor operator.

And in the extractor, we will use modern C++ algorithms, to split a string into tokens. For this purpose, we have the std::sregex_token_iterator . And because there is a specialized function for this purpose, we should use it. And besides, it is ultra simple.

With the below one-liner, we split the complete string into tokens and put the resulting tokens in a std::vector

std::vector token(std::sregex_token_iterator(line.begin(), line.end(), delimiter, -1), {});

Then we copy the resulting data in our member variables.

For demo output I have also overwritten the inserter operator. Now you can use the exteractor and inserter operators (">>" and "<<") for variables of type App, as for any other C++ integral variable.

In main, we use also an ultrasimple approach. First, we open the file and check, if this was OK.

Then we define a variable "apps" (A std::vector of App) and use its range constructor and the std::istream_operator to read the complete file. And, since the App has an overwritten extractor operator, it knows, how to read and will parse the complete CSV file for us.

Again, the very simple and short one-liner

std::vector apps(std::istream_iterator<App>(inFile), {});

will read the complete source file, all lines, parse the lines and store the member variables in the single App elements of the resulting std::vector .

Please see the complete example below:

#include <string>
#include <iostream>
#include <vector>
#include <fstream>
#include <regex>
#include <iterator>
#include <algorithm>

std::regex delimiter{ "," };

struct App {
    // The data. Member variables
    std::string name{};
    std::string category{};
    double rating{};
    int reviewNum{};

    // Overwrite extractor operator
    friend std::istream& operator >> (std::istream& is, App& app) {

        // Read a complete line
        if (std::string line{}; std::getline(is, line)) {
            // Tokenize it
            std::vector token(std::sregex_token_iterator(line.begin(), line.end(), delimiter, -1), {});
            // If we read at least 4 tokens then assign the values to our struct
            if (4U <= token.size()) {
                // Now copy the data from the vector to our members
                app.name = token[0]; app.category = token[1]; 
                app.rating = std::stod(token[2]); app.reviewNum = std::stoi(token[2]);
            }
        }
        return is;
    }

    // Overwrite inserter operator
    friend std::ostream& operator << (std::ostream& os, const App& app) {
        return os << "Name:     " << app.name << "\nCategory: " << app.category
            << "\nRating:   " << app.rating << "\nReviews:  " << app.reviewNum;
    }
};

int main() {

    // Open file and check, if it could be opened
    if (std::ifstream inFile("googleplaystore.csv"); inFile) {

        // Define the variable and use range constructor to read and parse the complete file
        std::vector apps(std::istream_iterator<App>(inFile), {});

        // Show result to the user
        std::copy(apps.begin(), apps.end(), std::ostream_iterator<App>(std::cout, "\n"));
    }
    return 0;
}

What a pity that nobody will read this . . .

Disclaimer: This is pure example code, not productive, so without error handling.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM