简体   繁体   中英

Setting file pointer position

  • I have a very large text file containing a number of entries arranged in lines.
  • The first word of each line is like a " key " for me. The other words of the line are numbers.
  • The first word of a line can exist in a large number of other lines as well.

As an example consider a sample of the file as follows :

Associative 19 78 45 23 
Disjunctive 23 45 02 200
Associative 23 546 32 56
Conjunctive 22 22 00 3478
Disjunctive 11 934 88 34

My aim :

Do a certain set of operations for all "Associatives", "Disjunctives" and "Conjunctives". The file is very large and is not sorted. I can do an additional operation of sort using bash, but just consider the case where I would like to avoid it.

My approach :

Step 1 : Open the file using **std::ifstream**
Step 2 : Create an unordered set to store the unique first words.
Step 3 : Create a multimap of type multimap<std::string,streampos>
Step 4 : Traverse the file using std::ifstream::ignore, and keep adding the first word to the unordered set, and stream position to the multimap alongwith the first word.
Step 5 : The thought is that in this way a primary index of stream position and line numbers is being created.
Step 6 : Now go through each element of the unordered set and use multimap::equal_range to look for stream positions for that key.
Step 7 : Traverse through those stream positions and do your operation

Q1. Is this approach correct to read a specific line from a file using C++ ?

Q2. Following is a basic snippet of the C++ program that I wrote to test this idea. However I do not find the idea to succeed. The program is complete. You can simply copy and paste the code and use the above sample of a text file to see the output. Specifically the problem is as follows : When I set the stream position using seekg and then try to read a line, it seems that nothing happens (ie the stream position is not changed). The code snippet is as follows :

#include<iostream>
#include<fstream>
#include<limits>
#include<unordered_set>
#include<map>
using namespace std;
int main(int argc,char* argv[])
{
        if (argc<2)
        {
                cout<<"Usage: get_negatives <Full Path of Annotation File> \n"<<endl;
                return 0;
        }

        ifstream fileGT; 
        fileGT.open(argv[1]);//Open the file containing groundtruth annotations
        string filename;
        unordered_set<string> unique_files; //Open this unordered set to uniquely store the file names
        multimap<string,streampos> file_lines; //Open this multimap to store the file names as keys and corresponding line numbers as the values
        streampos filepos = fileGT.tellg();
        fileGT>>filename; 
        unique_files.insert(filename);
        file_lines.insert(pair<string,streampos>(filename,filepos));
        while(!fileGT.eof())
        {
                fileGT.ignore(numeric_limits<streamsize>::max(),'\n');
                filepos = fileGT.tellg();       
                fileGT>>filename;
                unique_files.insert(filename);
                file_lines.insert(pair<string,streampos >(filename,filepos));
        }

        for(auto it=unique_files.begin(); it!=unique_files.end(); ++it)
        {
                pair<multimap<string,streampos>::iterator, multimap<string,streampos>::iterator>range_vals;
                range_vals = file_lines.equal_range(*it);
                for(auto it2=range_vals.first; it2!=range_vals.second; ++it2)
                {
                        fileGT.seekg(it2->second,ios_base::beg);
                        getline(fileGT,filename);       
                        cout<<filename<<endl;
                }
        }


        return -1;

}       

The problem is that seekg() sometimes does not work properly if some of the error bits are set.

You must always call fileGT.clear() before each fileGT.seekg() . I think that is supposed to be the default mode in C++11, but I wouldn't bet on that.

Also, it is a good idea to check for errors after each read:

if (!getline(fileGT, filename))
    //error handling

And, as I said in the comments, if you are going to seek around, you must open the file with std::ios::binary .

I haven't tested your code but here are a few changes I would recommend:

  • Most operating systems I have encountered use the convention, for return value from main, return 0 for typical/correct output and return 1 (or non-zero) for anomalous cases.

  • Don't use \\n and endl unless you really need to, I don't think here is one of thouse cases.

  • Consider reordering your while loop so the ignore is at the end, consider the following:

.

std::string buf;
std::ifstream fp("input");
while (fp)
{
  if (fp >> buf) { /* do something with buf */ }
  fp.ignore(streamsize::max(), '\n');
}
  • Whenever you read from a stream don't assume the output is good or the stream is still valid. Check the error flags (using the bool overload or fp.good() ). Just checking fp.eof() may not always be enough.

  • If you are using C++11 seekg should run fine even after you reach the end of file however in earlier variants you will need to clear the stream error bits using fp.clear() .

  • If you arn't compiling with C++11 the auto keyword may not do what you think it does, be careful. You might also want to consider const auto& .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM