简体   繁体   English

将gzstream与gzip压缩文件一起使用时,如何保持流的位置?

[英]How to keep stream position when using gzstream with gzipped file?

I have to deal with large files compressed with gzipped. 我必须处理用gzip压缩的大文件。 I need to access a subset of the lines, not necessarily in order. 我需要访问行的子集,而不必按顺序访问。 Thus, I was thinking to go through all the file once while recording the stream position at the lines I am interested in. And then, to use these streams positions to quickly retrieve the information I need. 因此,我当时想遍历所有文件,同时在我感兴趣的行上记录流的位置。然后,使用这些流的位置快速检索所需的信息。

For this, I am using gzstream . 为此,我正在使用gzstream But unfortunately tellg doesn't seem to work with this wrapper: 但是不幸的是, tellg似乎不适用于此包装器:

#include <iostream>
#include <fstream>
using namespace std;

#include <gzstream.h>

int main (int argc, char ** argv)
{
  string inFile;
  string line;

  system ("rm -f infile1.txt; echo \"toto1\ntoto2\ntoto3\" > infile1.txt");
  inFile = "infile1.txt";
  ifstream inStream;
  inStream.open (inFile.c_str());
  cout << inStream.tellg () << endl;
  getline (inStream, line);
  cout << inStream.tellg () << endl;
  inStream.close ();

  system ("rm -f infile1.gz; echo \"toto1\ntoto2\ntoto3\" | gzip > infile1.gz");
  inFile = "infile1.gz";
  igzstream igzStream;
  igzStream.open (inFile.c_str());
  cout << igzStream.tellg () << endl;
  getline (igzStream, line);
  cout << igzStream.tellg () << endl;
  igzStream.close ();

  return 0;
}

This code returns this: 此代码返回以下内容:

$ gcc -Wall test.cpp -lstdc++ -lgzstream -lz
$ ./a.out
0
6
18446744073709551615
18446744073709551615

Is there a way to make this work with igzstream? 有没有办法让igzstream发挥作用? Or should I use Boost gzip filters instead? 还是应该改用Boost gzip过滤器 Any code snippet would be greatly appreciated ;) 任何代码片段将不胜感激;)

gzstream doesn't support seeking in a file, and this is not a particularly efficient operation in a gzipped file anyways. gzstream不支持在文件中查找,这在gzip压缩的文件中并不是特别有效的操作。 You can look at this question and its answer: Random access gzip stream 您可以查看此问题及其答案: 随机访问gzip流

One of the answer gives a link to an example code from the zlib source code that you could use to help you implement the feature you want in gzstream. 答案之一提供了zlib源代码中示例代码的链接,您可以使用该示例代码来帮助您实现gzstream中所需的功能。 Another answer suggests a variant compressed format that does support seeking more efficiently. 另一个答案提出了一种变体压缩格式,该格式确实支持更有效的查找。

Boost iostream may support seeking, but gzstream is quite a lot easier to use and to modify so I've tended to stick with that. Boost iostream可能支持搜索,但是gzstream易于使用和修改,因此我倾向于这样做。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM