简体   繁体   English

如何*正确*使用 bz2 和 python 压缩和解压缩文本文件

[英]How to *properly* compress and decompress a text file using bz2 and python

So I've had this system that scrapes and compresses files for a while now using bz2 compression.所以我有这个系统,现在使用 bz2 压缩来抓取和压缩文件。 The way it does so is using the following block of code I found on SO a few months back:这样做的方式是使用几个月前我在 SO 上找到的以下代码块:

Let's assume for the purposes of this post the filename is always file.XXXX where XXXX is the relevant extension.出于本文的目的,我们假设文件名始终为file.XXXX ,其中XXXX是相关的扩展名。 We start with .txt我们从.txt开始

### How to compress a text file
filepath_compressed = "file.tar.bz2"
with open("file.txt", 'rb') as data:
    tarbz2contents = bz2.compress(data.read(), 9)
    with bz2.BZ2File(filepath_compressed, 'wb') as f_comp:
        f_comp.write(tarbz2contents)

Now, to decompress it, I've always got it to work using a decompression software I have called Keka which decompresses the .tar.bz2 file to .tar , then I run it through Keka again to get an "extensionless" file which I then add a .txt to on my mac and then it works.现在,为了解压它,我总是使用我称为 Keka 的解压软件来工作,该软件将.tar.bz2文件解压缩为.tar ,然后我再次通过 Keka 运行它以获得一个“无扩展”文件,我然后在我的mac上添加一个.txt ,然后它就可以工作了。

Now, to do decompress programmatically, I've tried a few things.现在,要以编程方式进行解压缩,我尝试了一些方法。 I've tried the stuff from this post and the code from this post .我已经尝试过这篇文章中的内容和这篇文章中的代码。 I've tried using BZ2Decompressor and BZ2File and everything.我试过使用 BZ2Decompressor 和 BZ2File 和一切。 I just seem to be missing something and I'm not sure what it is.我只是似乎遗漏了一些东西,我不确定它是什么。

Here is what I have so far, and I'd like to know what is wrong with this code:这是我到目前为止所拥有的,我想知道这段代码有什么问题:

import bz2, tarfile, shutil

# Decompress to tar
with bz2.BZ2File("file.tar.bz2") as fr, open("file.tar", "wb") as fw:
    shutil.copyfileobj(fr, fw)
    
# Decompress from tar to txt
with tarfile.open("file.tar", "r:") as tar:
    tar.extractall("file_out.txt")

This code crashes because of a " tarfile.ReadError: truncated header " problem.此代码由于“ tarfile.ReadError: truncated header ”问题而崩溃。 I think the first context manager outputs a binary text file, and I tried decoding that but that failed too.我认为第一个上下文管理器输出一个二进制文本文件,我尝试对其进行解码,但也失败了。 What am i missing here i feel like a noob.我在这里想念什么,我觉得自己像个菜鸟。


If you would like a minimum runnable piece of code to replicate this, add the following to make a dummy file:如果您希望使用最少的可运行代码来复制它,请添加以下内容以创建一个虚拟文件:

lines = ["Line 1","Line 2", "Line 3"]

with open("file.txt", "w") as f:
    for line in lines:
        f.write(line+"\n")

The thing that you're making is not a .tar.bz2 file, but rather a .bz2.bz2 file.您正在制作的不是.tar.bz2文件,而是.bz2.bz2文件。 You are compressing twice with bzip2 (the second time with no effect), and there is no tar file generation anywhere to be seen.您使用 bzip2 压缩了两次(第二次没有效果),并且在任何地方都没有生成 tar 文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM