繁体   English   中英

使用python os walk解压缩30,000个子文件夹中的bz2文件?

[英]Decompress bz2 files in 30,000 subfolders with python os walk?

我有30,000个文件夹,每个文件夹包含5个bz2文件的json数据。

我正在尝试使用os.walk()遍历文件路径并解压缩每个压缩文件并保存在原始目录中。

import os
import bz2

path = "/Users/mac/PycharmProjects/OSwalk/Data"
for(dirpath,dirnames,files) in os.walk(path):

for filename in files:
    filepath = os.path.join(dirpath , filename)
     newfilepath = os.path.join(dirpath , filename + '.decompressed')

        with open(newfilepath , 'wb') as new_file , 
          bz2.BZ2File(filepath , 'rb') as file:

              for data in iter(lambda: file.read(100 * 1024) , b''):
                  new_file.write(data)

我在运行代码时遇到以下错误。

File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/_compr 
ession.py", line 103, in read
data = self._decompressor.decompress(rawblock, size)
OSError: Invalid data stream

我已经读过,在Mac上使用decompressor方法运行代码可能会出现问题,或者我还缺少其他东西吗?

看来您可能正在尝试解压缩已经解压缩的结果。 您应该将它们过滤掉。

import os
import bz2

path = "/Users/mac/PycharmProjects/OSwalk/Data"
for (dirpath, dirnames, files) in os.walk(path):
    for filename in files:
        # filter out decompressed files
        if filename.endswith('.decompressed'):
            continue

        filepath = os.path.join(dirpath, filename)
        newfilepath = os.path.join(dirpath, filename + '.decompressed')

        with open(newfilepath, 'wb') as new_file,
            bz2.BZ2File(filepath, 'rb') as file:

            for data in iter(lambda: file.read(100 * 1024), b''):
                new_file.write(data)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM