[英]Unzipping number of files iteratively by using Python
我有2 TB的数据,我必须解压缩文件才能进行一些分析。 但是,由于硬盘空间问题,我无法一次解压缩所有文件。 我认为首先解压缩前两千个,然后进行分析并在接下来的2000年重复它。我怎么能这样做?
import os, glob
import zipfile
root = 'C:\\Users\\X\\*'
directory = 'C:\\Users\\X'
extension = ".zip"
to_save = 'C:\\Users\\X\\to_save'
#x = os.listdir(path)[:2000]
for folder in glob.glob(root):
if folder.endswith(extension): # check for ".zip" extension
try:
print(folder)
os.chdir(to_save)
zipfile.ZipFile(os.path.join(directory, folder)).extractall(os.path.join(directory, os.path.splitext(folder)[0]))
except:
pass
关于什么?:
import os
import glob
import zipfile
root = 'C:\\Users\\X\\*'
directory = 'C:\\Users\\X'
extension = ".zip"
to_save = 'C:\\Users\\X\\to_save'
# list comp of all '.zip' folders
folders = [folder for folder in glob.glob(root) if folder.endswith(extension)]
# only executes while there are folders remaining to be processed
while folders:
# only grabs the next 2000 folders if there are at least that many
if len(folders) >= 2000:
temp = folders[:2000]
# otherwise gets all the remaining (i.e. 1152 were left)
else:
temp = folders[:]
# list comp that rebuilds with elements not pulled into 'temp'
folders = [folder for folder in folders if folder not in temp]
# this was all your code, I just swapped 'x' in place of 'folder'
for x in temp:
try:
print(x)
os.chdir(to_save)
zipfile.ZipFile(os.path.join(directory, x)).extractall(os.path.join(directory, os.path.splitext(x)[0]))
except:
pass
这将生成.zip的临时列表,然后从原始列表中删除这些元素。 唯一的缺点是folders
被修改,所以如果你需要在其他地方使用它,它最终会是空的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.