简体   繁体   English

在 Windows 上解压 bz2 文件

[英]Decompressing bz2 files on Windows

I am trying to decompress a bz2 file with below code snippet which is provided in various places:我正在尝试使用以下代码片段解压缩 bz2 文件,该文件在不同地方提供:

bz2_data = bz2.BZ2File(DATA_FILE+".bz2").read()
open(DATA_FILE, 'wb').write(bz2_data)

However, I am getting a much smaller file than I expect.但是,我得到的文件比我预期的要小得多。

When I extract the file with 7z GUI I am receiving a file with a size of 248MB.当我使用 7z GUI 提取文件时,我收到一个大小为 248MB 的文件。 However, with above code the file I get is 879kb.但是,使用上面的代码,我得到的文件是 879kb。

When I read the extracted XML file, I can see that rest of the file is missing as I expect.当我读取提取的 XML 文件时,我可以看到文件的其余部分丢失了,正如我所料。

I am running anaconda on Windows machine, and as far as understand bz2 reaches an EOF before file actually ends.我在 Windows 机器上运行 anaconda,据了解bz2在文件实际结束之前到达 EOF。

By the way, I already run into this and this both did no good.顺便说一句,我已经遇到了这个两个都没有好处。

If this is a multi-stream file, then Python's bz2 module (before 3.3) doesn't support it: 如果这是一个多流文件,则Python的bz2模块(3.3之前的版本)不支持它:

Note This class does not support input files containing multiple streams (such as those produced by the pbzip2 tool). 注意此类不支持包含多个流的输入文件(例如pbzip2工具生成的流)。 When reading such an input file, only the first stream will be accessible. 读取此类输入文件时,仅第一个流将可访问。 If you require support for multi-stream files, consider using the third-party bz2file module (available from PyPI). 如果您需要多流文件的支持,请考虑使用第三方bz2file模块(可从PyPI获得)。 This module provides a backport of Python 3.3's BZ2File class, which does support multi-stream files. 该模块提供了Python 3.3的BZ2File类的反向移植,该类确实支持多流文件。

An alternative, drop-in replacement: bz2file should work though. 一个替代的直接替换: bz2file应该可以工作。

If it is a multistream file, you have to set mode to "r" or it will silently fail (eg output the compressed data as is).如果它是一个多流文件,您必须将mode设置为"r" ,否则它将静默失败(例如,按原样输出压缩数据)。

This should do what you want:这应该做你想做的:

with open(out_file_path) as out_file, BZ2File(bz2_file_path, "r") as bz2_file:
        for data in iter(lambda: bz2_file.read(100 * 1024), b""):
            out_file.write(data)

From the documentation:从文档:

If mode is 'r', the input file may be the concatenation of multiple compressed streams.如果 mode 为 'r',则输入文件可能是多个压缩流的串联。

https://docs.python.org/3/library/bz2.html#bz2.BZ2File https://docs.python.org/3/library/bz2.html#bz2.BZ2File

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM