[英]Using bzip2 low-level routines to compress chunks of data
The Overview 概述
I am using the low-level calls in the libbzip2
library: BZ2_bzCompressInit()
, BZ2_bzCompress()
and BZ2_bzCompressEnd()
to compress chunks of data to standard output. 我正在使用
libbzip2
库中的低级调用: BZ2_bzCompressInit()
, BZ2_bzCompress()
和BZ2_bzCompressEnd()
来将数据块压缩到标准输出。
I am migrating working code from higher-level calls, because I have a stream of bytes coming in and I want to compress those bytes in sets of discrete chunks (a discrete chunk is a set of bytes that contains a group of tokens of interest — my input is logically divided into groups of these chunks). 我正在从更高级别的调用迁移工作代码,因为我有一个字节流进来,我想在离散块的集合中压缩这些字节( 离散块是一组包含一组感兴趣的令牌的字节 -我的输入在逻辑上被分成这些块的组。
A complete group of chunks might contain, say, 500 chunks, which I want to compress to one bzip2 stream and write to standard output. 一组完整的块可能包含500个块,我想压缩到一个bzip2流并写入标准输出。
Within a set, using the pseudocode I outline below, if my example buffer is able to hold 101 chunks at a time, I would open a new stream, compress 500 chunks in runs of 101, 101, 101, 101, and one final run of 96 chunks that closes the stream. 在一个集合中,使用我在下面概述的伪代码,如果我的示例缓冲区一次能够容纳101个块,我将打开一个新流,在101,101,101,101和最后一次运行中压缩500个块96个关闭流的块。
The Problem 问题
The issue is that my bz_stream
structure instance, which keeps tracks of the number of compressed bytes in a single pass of the BZ2_bzCompress()
routine, seems to claim to be writing more compressed bytes than the total bytes in the final, compressed file. 问题是我的
bz_stream
结构实例在BZ2_bzCompress()
例程的单次传递中保留了压缩字节数的跟踪,似乎声称写的压缩字节比最终压缩文件中的总字节数要多。
For example, the compressed output could be a file with a true size of 1234 bytes, while the number of reported compressed bytes (which I track while debugging) is somewhat higher than 1234 bytes (say 2345 bytes). 例如,压缩输出可以是真实大小为1234字节的文件,而报告的压缩字节数(我在调试时跟踪)略高于1234字节(比如2345字节)。
My rough pseudocode is in two parts. 我粗糙的伪代码分为两部分。
The first part is a rough sketch of what I do to compress a subset of chunks (and I know that I have another subset coming after this one): 第一部分是我对压缩块子集的做法的粗略草图(我知道在此之后我还有另一个子集):
bz_stream bzStream;
unsigned char bzBuffer[BZIP2_BUFFER_MAX_LENGTH] = {0};
unsigned long bzBytesWritten = 0UL;
unsigned long long cumulativeBytesWritten = 0ULL;
unsigned char myBuffer[UNCOMPRESSED_MAX_LENGTH] = {0};
size_t myBufferLength = 0;
/* initialize bzStream */
bzStream.next_in = NULL;
bzStream.avail_in = 0U;
bzStream.avail_out = 0U;
bzStream.bzalloc = NULL;
bzStream.bzfree = NULL;
bzStream.opaque = NULL;
int bzError = BZ2_bzCompressInit(&bzStream, 9, 0, 0);
/* bzError checking... */
do
{
/* read some bytes into myBuffer... */
/* compress bytes in myBuffer */
bzStream.next_in = myBuffer;
bzStream.avail_in = myBufferLength;
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
do
{
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
bzError = BZ2_bzCompress(&bzStream, BZ_RUN);
/* error checking... */
bzBytesWritten = ((unsigned long) bzStream.total_out_hi32 << 32) + bzStream.total_out_lo32;
cumulativeBytesWritten += bzBytesWritten;
/* write compressed data in bzBuffer to standard output */
fwrite(bzBuffer, 1, bzBytesWritten, stdout);
fflush(stdout);
}
while (bzError == BZ_OK);
}
while (/* while there is a non-final myBuffer full of discrete chunks left to compress... */);
Now we wrap up the output: 现在我们结束输出:
/* read in the final batch of bytes into myBuffer (with a total byte size of `myBufferLength`... */
/* compress remaining myBufferLength bytes in myBuffer */
bzStream.next_in = myBuffer;
bzStream.avail_in = myBufferLength;
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
do
{
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
bzError = BZ2_bzCompress(&bzStream, (bzStream.avail_in) ? BZ_RUN : BZ_FINISH);
/* bzError error checking... */
/* increment cumulativeBytesWritten by `bz_stream` struct `total_out_*` members */
bzBytesWritten = ((unsigned long) bzStream.total_out_hi32 << 32) + bzStream.total_out_lo32;
cumulativeBytesWritten += bzBytesWritten;
/* write compressed data in bzBuffer to standard output */
fwrite(bzBuffer, 1, bzBytesWritten, stdout);
fflush(stdout);
}
while (bzError != BZ_STREAM_END);
/* close stream */
bzError = BZ2_bzCompressEnd(&bzStream);
/* bzError checking... */
The Questions 问题
cumulativeBytesWritten
(or, specifically, bzBytesWritten
) incorrectly, and how would I fix that? cumulativeBytesWritten
(或者,特别是bzBytesWritten
),我将如何解决这个问题? I have been tracking these values in a debug build, and I do not seem to be "double counting" the bzBytesWritten
value. 我一直在调试版本中跟踪这些值,我似乎并没有“重复计算”
bzBytesWritten
值。 This value is counted and used once to increment cumulativeBytesWritten
after each successful BZ2_bzCompress()
pass. 在每次成功执行
BZ2_bzCompress()
后,此值将被计算并使用一次以递增cumulativeBytesWritten
。
bz_stream
state flags? bz_stream
状态标志? For example, does the following compress and keep the bzip2 stream open, so long as I keep sending some bytes? 例如,以下压缩并保持bzip2流打开,只要我继续发送一些字节?
bzError = BZ2_bzCompress(&bzStream, BZ_RUN);
Likewise, can the following statement compress data, so long as there are at least some bytes are available to access from the bzStream.next_in
pointer ( BZ_RUN
), and then the stream is wrapped up when there are no more bytes available ( BZ_FINISH
)? 同样,以下语句可以压缩数据,只要至少有一些字节可用于从
bzStream.next_in
指针( BZ_RUN
)访问,然后当没有更多可用字节( BZ_FINISH
)时流被包装?
bzError = BZ2_bzCompress(&bzStream, (bzStream.avail_in) ? BZ_RUN : BZ_FINISH);
There's probably a simple solution to this, but I've been banging my head on the table for a couple days in the course of debugging what could be wrong, and I'm not making much progress. 可能有一个简单的解决方案,但是在调试可能出错的过程中,我已经在桌子上敲了几天,而且我没有取得多大进展。 Thank you for any advice.
谢谢你的任何建议。
In answer to my own question, it appears I am miscalculating the number of bytes written. 在回答我自己的问题时,似乎我错误地计算了写入的字节数。 I should not use the
total_out_*
members. 我不应该使用
total_out_*
成员。 The following correction works properly: 以下更正正常:
bzBytesWritten = sizeof(bzBuffer) - bzStream.avail_out;
The rest of the calculations follow. 其余的计算如下。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.