简体   繁体   中英

How to keep stream position when using gzstream with gzipped file?

I have to deal with large files compressed with gzipped. I need to access a subset of the lines, not necessarily in order. Thus, I was thinking to go through all the file once while recording the stream position at the lines I am interested in. And then, to use these streams positions to quickly retrieve the information I need.

For this, I am using gzstream . But unfortunately tellg doesn't seem to work with this wrapper:

#include <iostream>
#include <fstream>
using namespace std;

#include <gzstream.h>

int main (int argc, char ** argv)
{
  string inFile;
  string line;

  system ("rm -f infile1.txt; echo \"toto1\ntoto2\ntoto3\" > infile1.txt");
  inFile = "infile1.txt";
  ifstream inStream;
  inStream.open (inFile.c_str());
  cout << inStream.tellg () << endl;
  getline (inStream, line);
  cout << inStream.tellg () << endl;
  inStream.close ();

  system ("rm -f infile1.gz; echo \"toto1\ntoto2\ntoto3\" | gzip > infile1.gz");
  inFile = "infile1.gz";
  igzstream igzStream;
  igzStream.open (inFile.c_str());
  cout << igzStream.tellg () << endl;
  getline (igzStream, line);
  cout << igzStream.tellg () << endl;
  igzStream.close ();

  return 0;
}

This code returns this:

$ gcc -Wall test.cpp -lstdc++ -lgzstream -lz
$ ./a.out
0
6
18446744073709551615
18446744073709551615

Is there a way to make this work with igzstream? Or should I use Boost gzip filters instead? Any code snippet would be greatly appreciated ;)

gzstream doesn't support seeking in a file, and this is not a particularly efficient operation in a gzipped file anyways. You can look at this question and its answer: Random access gzip stream

One of the answer gives a link to an example code from the zlib source code that you could use to help you implement the feature you want in gzstream. Another answer suggests a variant compressed format that does support seeking more efficiently.

Boost iostream may support seeking, but gzstream is quite a lot easier to use and to modify so I've tended to stick with that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM