繁体   English   中英

解压bz2文件

[英]Decompress bz2 files

我想解压缩不同路径中不同目录中的文件。 代码如下,错误为无效数据stream。 请帮帮我。 太感谢了。

import sys
import os
import bz2
from bz2 import decompress

path = "Dir"
for(dirpath,dirnames,files)in os.walk(path):
   for file in files:
       filepath = os.path.join(dirpath,filename)
       newfile = bz2.decompress(file)
       newfilepath = os.path.join(dirpath,newfile)

bz2.compress / decompress使用二进制数据:

>>> import bz2
>>> compressed = bz2.compress(b'test_string')
>>> compressed
b'BZh91AY&SYJ|i\x05\x00\x00\x04\x83\x80\x00\x00\x82\xa1\x1c\x00 \x00"\x03h\x840"
P\xdf\x04\x99\xe2\xeeH\xa7\n\x12\tO\x8d \xa0'
>>> bz2.decompress(compressed)
b'test_string'

简而言之 - 您需要手动处理文件内容。 如果你有非常大的文件,你应该更喜欢使用bz2.BZ2Decompressorbz2.decompress ,因为后者要求你将整个文件存储在一个字节数组中。

for filename in files:
    filepath = os.path.join(dirpath, filename)
    newfilepath = os.path.join(dirpath,filename + '.decompressed')
    with open(newfilepath, 'wb') as new_file, open(filepath, 'rb') as file:
        decompressor = BZ2Decompressor()
        for data in iter(lambda : file.read(100 * 1024), b''):
            new_file.write(decompressor.decompress(data))

您还可以使用bz2.BZ2File使这更简单:

for filename in files:
    filepath = os.path.join(dirpath, filename)
    newfilepath = os.path.join(dirpath, filename + '.decompressed')
    with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
        for data in iter(lambda : file.read(100 * 1024), b''):
            new_file.write(data)

bz2.decompress获取压缩数据并对其进行膨胀。 您传递文件名,而不是文件中的数据!

改为:

zipfile = bz2.BZ2File(filepath) # open the file
data = zipfile.read() # get the decompressed data
newfilepath = filepath[:-4] # assuming the filepath ends with .bz2
open(newfilepath, 'wb').write(data) # write a uncompressed file

这应该工作

for file in files:
    archive_path = os.path.join(dirpath,filename)
    outfile_path = os.path.join(dirpath, filename[:-4])
    with open(archive_path, 'rb') as source, open(outfile_path, 'wb') as dest:
        dest.write(bz2.decompress(source.read()))

这对于大文件要快得多,因为它增量写入 output,而不会将整个解压缩文件存储在 memory 中:

import bz2,shutil
filepath = 'test.txt.bz2'
with bz2.BZ2File(filepath) as fr, open(filepath[:-4],"wb") as fw:
    shutil.copyfileobj(fr,fw)

归功于 https://stackoverflow.com/a/49073452/3427777

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM