简体   繁体   中英

Detect last line of file C++

I've been working on some code for a file parser function to learn some C++:

It's supposed to read in this text file:

>FirstSeq
AAAAAAAAAAAAAA
BBBBBBBBBBBBBB
>SecondSeq
TTTTTTTTTTTTTT
>ThirdSequence
CCCCCCCCCCCCCC
>FourthSequence
GGGGGGGGGGGGGG

and print out the names (lines with '>' at the start) and then the sequences. However from the output:

AAAAAAAAAAAAAABBBBBBBBBBBBBB
TTTTTTTTTTTTTT
CCCCCCCCCCCCCC
FirstSeq
SecondSeq
ThirdSequence
FourthSequence

We see that the final line of G characters is not included. The code is below. What it does is loop over lines, if it finds a name, appends it to the vector of names, if it finds a sequence, appends it to a temporary string (incase the sequence is more than one line, like the first sequence), then when it finds the name of the next sequence, stores the built up temporary string in a vector and then proceeds by overwriting the temporary string and starting again. I suspect that it is because in the while loop of the function: The line fullSequence.push_back(currentSeq); which is called whenever a new name was detected previously to push the old temp string onto the vector would not be called for the last line of G's and so it is not being included, although the name "FourthSeq" is recorded, rather the line of G's is read into the temporary string, but then is not passed to the vector. So, how can I make it so as I can detect that this is the last line of the file and so should make sure the temporary string is pushed onto the vector?

Thanks, Ben.

CODE:

#include<fstream>
#include<iostream>
#include<string>
#include<vector>
void fastaRead(string fileName)
{
    ifstream inputFile;
    inputFile.open(fileName);
    if (inputFile.is_open()) {
        vector<string> fullSequence, sequenceNames;
        string currentSeq;
        string line;
        bool newseq = false;
        bool firstseq = true;
        cout << "Reading Sequence" << endl;
        while (getline(inputFile, line))
        {
            if (line[0] == '>') {
                sequenceNames.push_back(line.substr(1,line.size()));
                newseq = true;
            } else {
                if (newseq == true) {
                    if(firstseq == false){
                        fullSequence.push_back(currentSeq);
                    } else {
                        firstseq = false;
                    }
                    currentSeq = line;
                    newseq = false;
                } else {
                    currentSeq.append(line);
                }
            }
        }
        //Report back the sequences and the sequence names...
        for ( vector<string>::iterator i = fullSequence.begin(); i != fullSequence.end(); i++) {
            cout << *i << endl;
        }
        for ( vector<string>::iterator i = sequenceNames.begin(); i != sequenceNames.end(); i++) {
            cout << *i << endl;
        }
        cout << fullSequence.size() << endl;
        cout << sequenceNames.size() << endl;
        inputFile.close();
    } else {
        perror("error whilst reading this file");
    }
    if(inputFile.bad()){
        perror("error whilst reading this file");
    }
}

int main()
{
    cout << "Fasta Sequence Filepath" << endl;
    string input = "boop.txt";
    fastaRead(input);
    return 0;
}

Getline() will "fail" when it finds an EOF in the line, so the last line you read will not go through your loop.

I've solved this problem two ways, either by having two flags or just by processing the last line after the loop.

For two flags, the loop requires both to be true, you set one to false when getline() fails, and you set the other one to false if the first one is false, this gives you one extra loop after EOF.

Good luck!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM