简体   繁体   English

使用python os walk解压缩30,000个子文件夹中的bz2文件?

[英]Decompress bz2 files in 30,000 subfolders with python os walk?

I've got 30,000 folders and each folder contains 5 bz2 files of json data. 我有30,000个文件夹,每个文件夹包含5个bz2文件的json数据。

I'm trying to use os.walk() to loop through the file path and decompress each compressed file and save in the original directory. 我正在尝试使用os.walk()遍历文件路径并解压缩每个压缩文件并保存在原始目录中。

import os
import bz2

path = "/Users/mac/PycharmProjects/OSwalk/Data"
for(dirpath,dirnames,files) in os.walk(path):

for filename in files:
    filepath = os.path.join(dirpath , filename)
     newfilepath = os.path.join(dirpath , filename + '.decompressed')

        with open(newfilepath , 'wb') as new_file , 
          bz2.BZ2File(filepath , 'rb') as file:

              for data in iter(lambda: file.read(100 * 1024) , b''):
                  new_file.write(data)

I'm getting the following error running the code. 我在运行代码时遇到以下错误。

File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/_compr 
ession.py", line 103, in read
data = self._decompressor.decompress(rawblock, size)
OSError: Invalid data stream

I've read that there can be an issue running the code on mac with decompressor method or am I missing something else? 我已经读过,在Mac上使用decompressor方法运行代码可能会出现问题,或者我还缺少其他东西吗?

It looks like you might be trying to decompress your already decompressed results. 看来您可能正在尝试解压缩已经解压缩的结果。 You should filter them out. 您应该将它们过滤掉。

import os
import bz2

path = "/Users/mac/PycharmProjects/OSwalk/Data"
for (dirpath, dirnames, files) in os.walk(path):
    for filename in files:
        # filter out decompressed files
        if filename.endswith('.decompressed'):
            continue

        filepath = os.path.join(dirpath, filename)
        newfilepath = os.path.join(dirpath, filename + '.decompressed')

        with open(newfilepath, 'wb') as new_file,
            bz2.BZ2File(filepath, 'rb') as file:

            for data in iter(lambda: file.read(100 * 1024), b''):
                new_file.write(data)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM