[英]Unzipping number of files iteratively by using Python
I have 2 TBs of data, and I have to unzip the files to do some analysis. 我有2 TB的数据,我必须解压缩文件才能进行一些分析。 However, due to hard disc space problem, I can not unzip all of the files at once.
但是,由于硬盘空间问题,我无法一次解压缩所有文件。 What I thought is unzipping first two thousand of them first, then doing my analysis and repeating it for the next 2000. How I could do it ?
我认为首先解压缩前两千个,然后进行分析并在接下来的2000年重复它。我怎么能这样做?
import os, glob
import zipfile
root = 'C:\\Users\\X\\*'
directory = 'C:\\Users\\X'
extension = ".zip"
to_save = 'C:\\Users\\X\\to_save'
#x = os.listdir(path)[:2000]
for folder in glob.glob(root):
if folder.endswith(extension): # check for ".zip" extension
try:
print(folder)
os.chdir(to_save)
zipfile.ZipFile(os.path.join(directory, folder)).extractall(os.path.join(directory, os.path.splitext(folder)[0]))
except:
pass
What about?: 关于什么?:
import os
import glob
import zipfile
root = 'C:\\Users\\X\\*'
directory = 'C:\\Users\\X'
extension = ".zip"
to_save = 'C:\\Users\\X\\to_save'
# list comp of all '.zip' folders
folders = [folder for folder in glob.glob(root) if folder.endswith(extension)]
# only executes while there are folders remaining to be processed
while folders:
# only grabs the next 2000 folders if there are at least that many
if len(folders) >= 2000:
temp = folders[:2000]
# otherwise gets all the remaining (i.e. 1152 were left)
else:
temp = folders[:]
# list comp that rebuilds with elements not pulled into 'temp'
folders = [folder for folder in folders if folder not in temp]
# this was all your code, I just swapped 'x' in place of 'folder'
for x in temp:
try:
print(x)
os.chdir(to_save)
zipfile.ZipFile(os.path.join(directory, x)).extractall(os.path.join(directory, os.path.splitext(x)[0]))
except:
pass
This makes a temporary list of the .zip's and then removes those elements from the original list. 这将生成.zip的临时列表,然后从原始列表中删除这些元素。 Only drawback is that
folders
gets modified so eventually it will be empty if you ever needed to use it elsewhere. 唯一的缺点是
folders
被修改,所以如果你需要在其他地方使用它,它最终会是空的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.