简体   繁体   English

使用Python迭代解压缩文件数

[英]Unzipping number of files iteratively by using Python

I have 2 TBs of data, and I have to unzip the files to do some analysis. 我有2 TB的数据,我必须解压缩文件才能进行一些分析。 However, due to hard disc space problem, I can not unzip all of the files at once. 但是,由于硬盘空间问题,我无法一次解压缩所有文件。 What I thought is unzipping first two thousand of them first, then doing my analysis and repeating it for the next 2000. How I could do it ? 我认为首先解压缩前两千个,然后进行分析并在接下来的2000年重复它。我怎么能这样做?

import os, glob
import zipfile


root = 'C:\\Users\\X\\*'
directory = 'C:\\Users\\X'
extension = ".zip"
to_save = 'C:\\Users\\X\\to_save'

#x = os.listdir(path)[:2000]
for folder in glob.glob(root):
    if folder.endswith(extension): # check for ".zip" extension
        try:
            print(folder)
            os.chdir(to_save)
            zipfile.ZipFile(os.path.join(directory, folder)).extractall(os.path.join(directory, os.path.splitext(folder)[0]))

        except:
            pass

What about?: 关于什么?:

import os
import glob
import zipfile

root = 'C:\\Users\\X\\*'
directory = 'C:\\Users\\X'
extension = ".zip"
to_save = 'C:\\Users\\X\\to_save'

# list comp of all '.zip' folders
folders = [folder for folder in glob.glob(root) if folder.endswith(extension)]

# only executes while there are folders remaining to be processed
while folders:
    # only grabs the next 2000 folders if there are at least that many
    if len(folders) >= 2000:
        temp = folders[:2000]
    # otherwise gets all the remaining (i.e. 1152 were left)
    else:
        temp = folders[:]

    # list comp that rebuilds with elements not pulled into 'temp'
    folders = [folder for folder in folders if folder not in temp]

    # this was all your code, I just swapped 'x' in place of 'folder'
    for x in temp:
        try:
            print(x)
            os.chdir(to_save)
            zipfile.ZipFile(os.path.join(directory, x)).extractall(os.path.join(directory, os.path.splitext(x)[0]))
        except:
            pass

This makes a temporary list of the .zip's and then removes those elements from the original list. 这将生成.zip的临时列表,然后从原始列表中删除这些元素。 Only drawback is that folders gets modified so eventually it will be empty if you ever needed to use it elsewhere. 唯一的缺点是folders被修改,所以如果你需要在其他地方使用它,它最终会是空的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM