繁体   English   中英

使用Python迭代解压缩文件数

[英]Unzipping number of files iteratively by using Python

我有2 TB的数据,我必须解压缩文件才能进行一些分析。 但是,由于硬盘空间问题,我无法一次解压缩所有文件。 我认为首先解压缩前两千个,然后进行分析并在接下来的2000年重复它。我怎么能这样做?

import os, glob
import zipfile


root = 'C:\\Users\\X\\*'
directory = 'C:\\Users\\X'
extension = ".zip"
to_save = 'C:\\Users\\X\\to_save'

#x = os.listdir(path)[:2000]
for folder in glob.glob(root):
    if folder.endswith(extension): # check for ".zip" extension
        try:
            print(folder)
            os.chdir(to_save)
            zipfile.ZipFile(os.path.join(directory, folder)).extractall(os.path.join(directory, os.path.splitext(folder)[0]))

        except:
            pass

关于什么?:

import os
import glob
import zipfile

root = 'C:\\Users\\X\\*'
directory = 'C:\\Users\\X'
extension = ".zip"
to_save = 'C:\\Users\\X\\to_save'

# list comp of all '.zip' folders
folders = [folder for folder in glob.glob(root) if folder.endswith(extension)]

# only executes while there are folders remaining to be processed
while folders:
    # only grabs the next 2000 folders if there are at least that many
    if len(folders) >= 2000:
        temp = folders[:2000]
    # otherwise gets all the remaining (i.e. 1152 were left)
    else:
        temp = folders[:]

    # list comp that rebuilds with elements not pulled into 'temp'
    folders = [folder for folder in folders if folder not in temp]

    # this was all your code, I just swapped 'x' in place of 'folder'
    for x in temp:
        try:
            print(x)
            os.chdir(to_save)
            zipfile.ZipFile(os.path.join(directory, x)).extractall(os.path.join(directory, os.path.splitext(x)[0]))
        except:
            pass

这将生成.zip的临时列表,然后从原始列表中删除这些元素。 唯一的缺点是folders被修改,所以如果你需要在其他地方使用它,它最终会是空的。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM