简体   繁体   English

在python中通过块解压缩文件夹

[英]Unzip folder by chunks in python

I have a big zip file containing many files that i'd like to unzip by chunks to avoid consuming too much memory. 我有一个包含许多文件的大型zip文件,我想通过块解压缩以避免消耗太多内存。

I tried to use python module zipfile but I didn't find a way to load the archive by chunk and to extract it on disk. 我试图使用python模块zipfile但我找不到通过块加载存档并将其提取到磁盘上的方法。

Is there simple way to do that in python ? 有没有简单的方法在python中这样做?

EDIT 编辑

@steven-rumbalski correctly pointed that zipfile correctly handle big files by unzipping the files one by one without loading the full archive. @ steven-rumbalski正确地指出zipfile通过逐个解压缩文件而不加载完整的存档来正确处理大文件。

My problem here is that my zip file is on AWS S3 and that my EC2 instance cannot load such a big file in RAM so I download it by chunks and I would like to unzip it by chunk. 我的问题是我的zip文件在AWS S3上,我的EC2实例无法在RAM中加载这么大的文件,所以我按块下载它,我想用块解压缩它。

You don't need a special way to extract a large archive to disk. 您不需要以特殊方式将大型存档提取到磁盘。 The source Lib/zipfile.py shows that zipfile is already memory efficient. Lib / zipfile.py显示zipfile已经具有内存效率。 Creating a zipfile.ZipFile object does not read the whole file into memory. 创建zipfile.ZipFile对象不会将整个文件读入内存。 Rather it just reads in the table of contents for the ZIP file. 相反,它只是读取ZIP文件的目录。 ZipFile.extractall() extracts files one at a time using shutil.copyfileobj() copying from a subclass of io.BufferedIOBase . ZipFile.extractall()使用从io.BufferedIOBase的子类复制的shutil.copyfileobj() 一次提取一个文件。

If all you want to do is a one-time extraction Python provides a shortcut from the command line: 如果您只想进行一次性提取,Python将从命令行提供快捷方式:

python -m zipfile -e archive.zip target-dir/

You can use zipfile (or possibly tarfile) as follows: 您可以使用zipfile(或可能是tarfile),如下所示:

import zipfile

def extract_chunk(fn, directory, ix_begin, ix_end):
    with zipfile.ZipFile("{}/file.zip".format(directory), 'r') as zf:
        infos = zf.infolist()
        print(infos)
        for ix in range(max(0, ix_begin), min(ix_end, len(infos))):
                zf.extract(infos[ix], directory)
        zf.close()

directory = "path"
extract_chunk("{}/file.zip".format(directory), directory, 0, 50)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM