简体   繁体   English

使用zlib的gzip文件访问功能解压缩文件大小

[英]Uncompressed file size using zlib's gzip file access function

Using linux command line tool gzip I can tell the uncompressed size of a compress file using gzip -l . 使用linux命令行工具gzip我可以使用gzip -l告诉压缩文件的未压缩大小。

I couldn't find any function like that on zlib manual section "gzip File Access Functions". 我在zlib手册部分“gzip文件访问函数”中找不到任何类似的函数。

At this link, I found a solution http://www.abeel.be/content/determine-uncompressed-size-gzip-file that involves reading the last 4 bytes of the file, but I am avoiding it right now because I prefer to use lib's functions. 在这个链接上,我找到了一个解决方案http://www.abeel.be/content/determine-uncompressed-size-gzip-file ,它涉及读取文件的最后4个字节,但我现在正在避免它,因为我更喜欢使用lib的功能。

There is no reliable way to get the uncompressed size of a gzip file without decompressing, or at least decoding the whole thing. 没有可靠的方法来获取gzip文件的未压缩大小而不解压缩,或至少解码整个事物。 There are three reasons. 有三个原因。

First, the only information about the uncompressed length is four bytes at the end of the gzip file (stored in little-endian order). 首先,关于未压缩长度的唯一信息是gzip文件末尾的四个字节(以little-endian顺序存储)。 By necessity, that is the length modulo 2 32 . 必要时,这是模数2 32的长度。 So if the uncompressed length is 4 GB or more, you won't know what the length is. 因此,如果未压缩的长度为4 GB或更多,您将无法知道长度是多少。 You can only be certain that the uncompressed length is less than 4 GB if the compressed length is less than something like 2 32 / 1032 + 18, or around 4 MB. 只能是肯定的是未压缩的长度小于4 GB如果压缩长度小于像2 一千零三十二分之三十二 + 18,或约4 MB。 (1032 is the maximum compression factor of deflate.) (1032是放气的最大压缩系数。)

Second, and this is worse, a gzip file may actually be a concatenation of multiple gzip streams. 其次,更糟糕的是,gzip文件实际上可能是多个gzip流的串联。 Other than decoding, there is no way to find where each gzip stream ends in order to look at the four-byte uncompressed length of that piece. 除了解码之外,没有办法找到每个gzip流结束的位置,以便查看该块的四字节未压缩长度。 (Which may be wrong anyway due to the first reason.) (由于第一个原因,这可能是错误的。)

Third, gzip files will sometimes have junk after the end of the gzip stream (usually zeros). 第三,gzip文件有时会在gzip流结束后出现垃圾(通常为零)。 Then the last four bytes are not the length. 那么最后四个字节不是长度。

So gzip -l doesn't really work anyway. 所以gzip -l无论如何都没有用。 As a result, there is no point in providing that function in zlib. 因此,在zlib中提供该功能毫无意义。

pigz has an option to in fact decode the entire input in order to get the actual uncompressed length: pigz -lt , which guarantees the right answer. pigz可以选择实际解码整个输入以获得实际的未压缩长度: pigz -lt ,这可以保证正确的答案。 pigz -l does what gzip -l does, which may be wrong. pigz -l执行gzip -l pigz -l所做的事情,这可能是错误的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM