在python中通过块解压缩文件夹

Question

I have a big zip file containing many files that i'd like to unzip by chunks to avoid consuming too much memory. 我有一个包含许多文件的大型zip文件，我想通过块解压缩以避免消耗太多内存。

I tried to use python module zipfile but I didn't find a way to load the archive by chunk and to extract it on disk. 我试图使用python模块zipfile但我找不到通过块加载存档并将其提取到磁盘上的方法。

Is there simple way to do that in python ? 有没有简单的方法在python中这样做？

EDIT 编辑

@steven-rumbalski correctly pointed that zipfile correctly handle big files by unzipping the files one by one without loading the full archive. @ steven-rumbalski正确地指出zipfile通过逐个解压缩文件而不加载完整的存档来正确处理大文件。

My problem here is that my zip file is on AWS S3 and that my EC2 instance cannot load such a big file in RAM so I download it by chunks and I would like to unzip it by chunk. 我的问题是我的zip文件在AWS S3上，我的EC2实例无法在RAM中加载这么大的文件，所以我按块下载它，我想用块解压缩它。

Answer 1

You don't need a special way to extract a large archive to disk. 您不需要以特殊方式将大型存档提取到磁盘。 The source Lib/zipfile.py shows that zipfile is already memory efficient. 源Lib / zipfile.py显示zipfile已经具有内存效率。 Creating a zipfile.ZipFile object does not read the whole file into memory. 创建zipfile.ZipFile对象不会将整个文件读入内存。 Rather it just reads in the table of contents for the ZIP file. 相反，它只是读取ZIP文件的目录。 ZipFile.extractall() extracts files one at a time using shutil.copyfileobj() copying from a subclass of io.BufferedIOBase . ZipFile.extractall()使用从io.BufferedIOBase的子类复制的shutil.copyfileobj() 一次提取一个文件。

If all you want to do is a one-time extraction Python provides a shortcut from the command line: 如果您只想进行一次性提取，Python将从命令行提供快捷方式：

python -m zipfile -e archive.zip target-dir/

Answer 2

You can use zipfile (or possibly tarfile) as follows: 您可以使用zipfile（或可能是tarfile），如下所示：

import zipfile

def extract_chunk(fn, directory, ix_begin, ix_end):
    with zipfile.ZipFile("{}/file.zip".format(directory), 'r') as zf:
        infos = zf.infolist()
        print(infos)
        for ix in range(max(0, ix_begin), min(ix_end, len(infos))):
                zf.extract(infos[ix], directory)
        zf.close()

directory = "path"
extract_chunk("{}/file.zip".format(directory), directory, 0, 50)

在python中通过块解压缩文件夹

问题描述

2 个解决方案

解决方案1
4 2017-02-23 15:03:35

解决方案2
0 2017-02-23 15:03:27

在python中通过块解压缩文件夹

问题描述

2 个解决方案

解决方案1 4 2017-02-23 15:03:35

解决方案2 0 2017-02-23 15:03:27

解决方案1
4 2017-02-23 15:03:35

解决方案2
0 2017-02-23 15:03:27