[英]Decompress bz2 files in 30,000 subfolders with python os walk?
我有30,000個文件夾,每個文件夾包含5個bz2文件的json數據。
我正在嘗試使用os.walk()遍歷文件路徑並解壓縮每個壓縮文件並保存在原始目錄中。
import os
import bz2
path = "/Users/mac/PycharmProjects/OSwalk/Data"
for(dirpath,dirnames,files) in os.walk(path):
for filename in files:
filepath = os.path.join(dirpath , filename)
newfilepath = os.path.join(dirpath , filename + '.decompressed')
with open(newfilepath , 'wb') as new_file ,
bz2.BZ2File(filepath , 'rb') as file:
for data in iter(lambda: file.read(100 * 1024) , b''):
new_file.write(data)
我在運行代碼時遇到以下錯誤。
File
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/_compr
ession.py", line 103, in read
data = self._decompressor.decompress(rawblock, size)
OSError: Invalid data stream
我已經讀過,在Mac上使用decompressor方法運行代碼可能會出現問題,或者我還缺少其他東西嗎?
看來您可能正在嘗試解壓縮已經解壓縮的結果。 您應該將它們過濾掉。
import os
import bz2
path = "/Users/mac/PycharmProjects/OSwalk/Data"
for (dirpath, dirnames, files) in os.walk(path):
for filename in files:
# filter out decompressed files
if filename.endswith('.decompressed'):
continue
filepath = os.path.join(dirpath, filename)
newfilepath = os.path.join(dirpath, filename + '.decompressed')
with open(newfilepath, 'wb') as new_file,
bz2.BZ2File(filepath, 'rb') as file:
for data in iter(lambda: file.read(100 * 1024), b''):
new_file.write(data)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.