简体   繁体   中英

Reading only the formatted data in a file that has formatted and unformatted data using c++

I have a data file with an unknown amount of unformatted, not needed data at the start and end of the file. But, in the middle, the data is precisely formatted and the first column will always start with one of a couple keywords. I want to skip to this part and read in that data, assigning each column to a variable. This would be simple if there wasn't the start and end "garbage" text.

Here is simple example problem. In my real code, each variable is part of a structure. I do not think this will matter, but mention it just in case...

here is my text file, I want all lines that start with keyword, and I want all columns assigned to variables


REMARK: this should be simpler
REMARK: yes, it should
REMARK: it is simple, you just don't see it yet
Comment that doesn't start with REMARK
keyword aaa 1 bbb 1 1.2555  O
keyword aaa 1 bbb 2 2.2555  H
keyword aaa 1 bbb 3 3.2555  C
keyword aaa 1 bbb 4 4.2555  C
END
Arbitrary garbage texts

if there were no random comments, I could use

int main{
    string filename = "textfile.pdb";
    string name1,name2,name3;
    int int1, int2;
    double number;

    ifstream inFile;
    inFile.open(filename.c_str());

    while (inFile.good())
    {
        inFile >> keyword >> name1 >>  
        int1>>name2>>int2>>number>>name3;
    }
    inFile.close();
}

I tried getting around this by using

while (getline(inFile,line))

This method lets me look at the line, and check if it has "keyword" in it. but then I couldn't use the convenient formatted input of the first method. I need to parse the string, which seems tricky in c++.I tried using sscanf but it complained about str to char.

The first method is nicer, I just don't know how to implement a check to only read in the line to the variables, if the line is a formatted one.

I'd suggest something like this:

Parsing text file in C++

string name,age,salary,hoursWorked,randomText;
ifstream readFile("textfile.txt");

while(getline(readFile,line))   {
    stringstream iss(line);
    getline(iss, name, ':');
    getline(iss, age, '-');
    getline(iss, salary, ',');
    getline(iss, hoursWorked, '[');
    getline(iss, randomText, ']');
}
readFile.close();

You can easily locate only the formatted lines you are interested in by reading each line and creating a stringstream from the line and validating the line begins with "keyword" and that it contains each remaining item. Since you are using stringstream , you need not read all values as a string , you can simply read the value as the desired type . If the line begins with END , you are done reading, just break; , otherwise if the first word is not "keyword" , just read the next line from the file and try again.

After opening an ifstream to your data file as f , you could do the following to locate and parse the wanted data:

    while (getline (f, line)) {         /* read each line */
        int aval, bval;                 /* local vars for parsing line */
        double dblval;
        std::string kw, a, b, ccode;
        std::stringstream s (line);     /* stringstream to parse line */
        /* if 1st word not keyword, handle line appropriately */
        if ((s >> kw) && kw != "keyword") {
            if (kw == "END")            /* done with data */
                break;
            continue;                   /* otherwise get next line */
        }   /* read/validate all other data values */
        else if ((s >> a) && (s >> aval) && (s >> b) && (s >> bval) &&
            (s >> dblval) && (s >> ccode))
            std::cout << kw << " " << a << " " << aval << " " << b <<
                    " " << bval << " " << dblval << " " << ccode << '\n';
        else {  /* otherwise invalid data line */
            std::cerr << "error: invalid data: " << line;
            continue;
        }
    }

(which just outputs the wanted values to stdout , you can use them as needed)

Putting it altogether in a short example to use with your data, you could do something similar to:

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>

int main (int argc, char **argv) {

    std::string line;   /* string to hold each line */

    if (argc < 2) {     /* validate at least 1 argument given */
        std::cerr << "error: insufficient input.\n"
                    "usage: " << argv[0] << " filename\n";
        return 1;
    }

    std::ifstream f (argv[1]);   /* open file */
    if (!f.is_open()) {     /* validate file open for reading */
        perror (("error while opening file " + 
                std::string(argv[1])).c_str());
        return 1;
    }

    while (getline (f, line)) {         /* read each line */
        int aval, bval;                 /* local vars for parsing line */
        double dblval;
        std::string kw, a, b, ccode;
        std::stringstream s (line);     /* stringstream to parse line */
        /* if 1st word not keyword, handle line appropriately */
        if ((s >> kw) && kw != "keyword") {
            if (kw == "END")            /* done with data */
                break;
            continue;                   /* otherwise get next line */
        }   /* read/validate all other data values */
        else if ((s >> a) && (s >> aval) && (s >> b) && (s >> bval) &&
            (s >> dblval) && (s >> ccode))
            std::cout << kw << " " << a << " " << aval << " " << b <<
                    " " << bval << " " << dblval << " " << ccode << '\n';
        else {  /* otherwise invalid data line */
            std::cerr << "error: invalid data: " << line;
            continue;
        }
    }
    f.close();
}

Example Input File

$ cat dat/formatted_only.txt
REMARK: this should be simpler
REMARK: yes, it should
REMARK: it is simple, you just don't see it yet
Comment that doesn't start with REMARK
keyword aaa 1 bbb 1 1.2555  O
keyword aaa 1 bbb 2 2.2555  H
keyword aaa 1 bbb 3 3.2555  C
keyword aaa 1 bbb 4 4.2555  C
END
Arbitrary garbage texts

Example Use/Output

$ ./bin/sstream_formatted_only dat/formatted_only.txt
keyword aaa 1 bbb 1 1.2555 O
keyword aaa 1 bbb 2 2.2555 H
keyword aaa 1 bbb 3 3.2555 C
keyword aaa 1 bbb 4 4.2555 C

Look things over and let me know if you have further questions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM