简体   繁体   中英

How to eliminate this extra element?

I am writing a c++ function for reading the nth column of a tab delimited text file, here is what I have done:

typedef unsigned int  uint;


inline void fileExists (const std::string& name) {
    if ( access( name.c_str(), F_OK ) == -1 ) {
        throw std::string("File does not exist!");
    }
}

size_t bimNCols(std::string fn) {
    try {
        fileExists(fn);
        std::ifstream in_file(fn);
        std::string tmpline;
        std::getline(in_file, tmpline);
        std::vector<std::string> strs;
        strs = boost::split(strs, tmpline, boost::is_any_of("\t"), boost::token_compress_on);
        return strs.size();
    } catch (const std::string& e) {
        std::cerr << "\n" << e << "\n";
        exit(EXIT_FAILURE);
    }
}

typedef std::vector<std::string> vecStr;

vecStr bimReadCol(std::string fn, uint ncol_select) {
    try {
        size_t ncols = bimNCols(fn);
        if(ncol_select < 1 or ncol_select > ncols) {
            throw std::string("Your column selection is out of range!");
        }

        std::ifstream in_file(fn);
        std::string tmpword;
        vecStr colsel; // holds the column of strings
        while (in_file) {
            for(int i=1; i<ncol_select; i++) {
                in_file >> tmpword;
            }
            in_file >> tmpword;
            colsel.push_back(tmpword);
            in_file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
        }
        return colsel;

    } catch (const std::string& e) {
        std::cerr << "\n" << e << "\n";
        exit(EXIT_FAILURE);
    }
}

The problem is, in the bimReadCol function, at the last line, after

in_file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');

in_file.good() still evaluates to true . So, suppose I have a text file test.txt like this:

a 1 b 2
a 1 b 2
a 1 b 2

bimReadCol("test.txt", 3) would return a vector (b, b, b, b) , with an extra element. Any idea how to fix this?

The usual solution for line oriented input is to read line by line, then parse each line:

std::string line;
while ( std::getline( in_file, line ) ) {
    std::istringstream parser( line );
    for ( int i = 1; parser >> tmpword && i <= ncol_select; ++ i ) {
    }
    if ( parser ) {
        colsel.push_back( tmpword );
    }
    //  No need for any ignore.
}

The important thing is that you must absolutely test after the input (be it from in_file or parser ) before you use the value. A test before the value was read doesn't mean anything (as you've seen).

Ok, I got it. The last line of the text file does not contain a newline, that's why in_file evaluates to true at the last line.

I think I should calculate the number of lines of the file, then replace while(in_file) with a for loop.

If someone has a better idea, please post it and I will accept.

Update

The fix turns out to be rather simple, just check if tmpword is empty:

vecStr bimReadCol(std::string fn, uint ncol_select) {
    try {
        size_t ncols = bimNCols(fn);
        if(ncol_select < 1 or ncol_select > ncols) {
            throw std::string("Your column selection is out of range!");
        }

        std::ifstream in_file(fn);
        vecStr colsel; // holds the column of strings
        std::string tmpword;
        while (in_file) {
            tmpword = "";
            for(int i=1; i<=ncol_select; i++) {
                in_file >> tmpword;
            }
            if(tmpword != "") {
                colsel.push_back(tmpword);
            }
            in_file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
        }
        return colsel;

    } catch (const std::string& e) {
        std::cerr << "\n" << e << "\n";
        exit(EXIT_FAILURE);
    }
}

As @James Kanze has pointed out, even if the last line contains a newline, in_file would still evaluate to true , but since we are at the end of file, the next reading into tmpword will be empty, so we will be fine as long as we check that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM