简体   繁体   English

使用gzip_compressor会产生不同的文件大小

[英]Using gzip_compressor yield different file sizes

I used gzip_compressor() to have compressed output file. 我用gzip_compressor()来压缩输出文件。 I used two methods for this purpose. 我为此目的使用了两种方法。 The common part is 共同的部分是

std::ofstream traceOut;
traceOut.open("log.gz", std::ios_base::out);
struct traceRec {
  traceRec(uint64_t c) : cycle(c) {};
  uint64_t cycle;
};
void writeTrace(traceRec &rec)
{
  boost::iostreams::filtering_ostream o;
  o.push(boost::iostreams::gzip_compressor());
  o.push(traceOut);
  // METHOD 1 OR 2
}

Method 1 方法1

I use 我用

 o.write(reinterpret_cast<const char*>(&rec.cycle), sizeof(rec.cycle));

With this implementation, the file size is 380K!! 有了这个实现,文件大小是380K !!

Method 2 方法2

I use 我用

 traceOut << rec.cycle << std::endl;

With this implementation, the file size is 78K!! 有了这个实现,文件大小是78K !!

So why they have different size?? 那他们为什么有不同的大小?? Another thing is that if I don't use the gzip_compressor and directly write to the file 另一件事是,如果我不使用gzip_compressor并直接写入文件

std::ofstream traceOut;
traceOut.open("log.gz", std::ios_base::out);
...
traceOut << rec.cycle << std::endl;

The file size will be 78K. 文件大小为78K。

So there are two problems: 所以有两个问题:

1- Using or not using gzip_compressor has no effect on file size 1-使用或不使用gzip_compressor对文件大小没有影响

2- Different implementations for using gzip_compressor yield different file sizes 2-使用gzip_compressor不同实现产生不同的文件大小

Any idea about that? 有什么想法吗?

operator << is likely using the textual representation of the number, while the write method take the complete variable size. operator <<很可能使用数字的文本表示,而write方法则采用完整的变量大小。

So if you have for example a cycle that's "13", in the "write" case, you'll consume 8 bytes, while you'll consume only 2 in the textual representation. 因此,如果你有一个“13”的循环,在“写”情况下,你将消耗8个字节,而你在文本表示中只消耗2个字节。

When compressed, the effect is even more dramatic, because when writing numbers as text, only 10 characters are used, (very very low entropy), so it's highly redundant and compressible. 压缩后,效果更加显着,因为当将数字作为文本编写时,只使用10个字符(非常低的熵),因此它是高度冗余和可压缩的。

On the other size, if your cycle counter is often very big (> 99999999), then the write method will gives better compression. 在另一个大小上,如果您的循环计数器通常非常大(> 99999999),那么write方法将提供更好的压缩。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM