简体   繁体   中英

Decompress bz2 files in 30,000 subfolders with python os walk?

I've got 30,000 folders and each folder contains 5 bz2 files of json data.

I'm trying to use os.walk() to loop through the file path and decompress each compressed file and save in the original directory.

import os
import bz2

path = "/Users/mac/PycharmProjects/OSwalk/Data"
for(dirpath,dirnames,files) in os.walk(path):

for filename in files:
    filepath = os.path.join(dirpath , filename)
     newfilepath = os.path.join(dirpath , filename + '.decompressed')

        with open(newfilepath , 'wb') as new_file , 
          bz2.BZ2File(filepath , 'rb') as file:

              for data in iter(lambda: file.read(100 * 1024) , b''):
                  new_file.write(data)

I'm getting the following error running the code.

File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/_compr 
ession.py", line 103, in read
data = self._decompressor.decompress(rawblock, size)
OSError: Invalid data stream

I've read that there can be an issue running the code on mac with decompressor method or am I missing something else?

It looks like you might be trying to decompress your already decompressed results. You should filter them out.

import os
import bz2

path = "/Users/mac/PycharmProjects/OSwalk/Data"
for (dirpath, dirnames, files) in os.walk(path):
    for filename in files:
        # filter out decompressed files
        if filename.endswith('.decompressed'):
            continue

        filepath = os.path.join(dirpath, filename)
        newfilepath = os.path.join(dirpath, filename + '.decompressed')

        with open(newfilepath, 'wb') as new_file,
            bz2.BZ2File(filepath, 'rb') as file:

            for data in iter(lambda: file.read(100 * 1024), b''):
                new_file.write(data)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM