[英]Unzip folder by chunks in python
I have a big zip file containing many files that i'd like to unzip by chunks to avoid consuming too much memory. 我有一个包含许多文件的大型zip文件,我想通过块解压缩以避免消耗太多内存。
I tried to use python module zipfile
but I didn't find a way to load the archive by chunk and to extract it on disk. 我试图使用python模块
zipfile
但我找不到通过块加载存档并将其提取到磁盘上的方法。
Is there simple way to do that in python ? 有没有简单的方法在python中这样做?
EDIT 编辑
@steven-rumbalski correctly pointed that zipfile
correctly handle big files by unzipping the files one by one without loading the full archive. @ steven-rumbalski正确地指出
zipfile
通过逐个解压缩文件而不加载完整的存档来正确处理大文件。
My problem here is that my zip file is on AWS S3 and that my EC2 instance cannot load such a big file in RAM so I download it by chunks and I would like to unzip it by chunk. 我的问题是我的zip文件在AWS S3上,我的EC2实例无法在RAM中加载这么大的文件,所以我按块下载它,我想用块解压缩它。
You don't need a special way to extract a large archive to disk. 您不需要以特殊方式将大型存档提取到磁盘。 The source Lib/zipfile.py shows that
zipfile
is already memory efficient. 源Lib / zipfile.py显示
zipfile
已经具有内存效率。 Creating a zipfile.ZipFile
object does not read the whole file into memory. 创建
zipfile.ZipFile
对象不会将整个文件读入内存。 Rather it just reads in the table of contents for the ZIP file. 相反,它只是读取ZIP文件的目录。
ZipFile.extractall()
extracts files one at a time using shutil.copyfileobj()
copying from a subclass of io.BufferedIOBase
. ZipFile.extractall()
使用从io.BufferedIOBase
的子类复制的shutil.copyfileobj()
一次提取一个文件。
If all you want to do is a one-time extraction Python provides a shortcut from the command line: 如果您只想进行一次性提取,Python将从命令行提供快捷方式:
python -m zipfile -e archive.zip target-dir/
You can use zipfile (or possibly tarfile) as follows: 您可以使用zipfile(或可能是tarfile),如下所示:
import zipfile
def extract_chunk(fn, directory, ix_begin, ix_end):
with zipfile.ZipFile("{}/file.zip".format(directory), 'r') as zf:
infos = zf.infolist()
print(infos)
for ix in range(max(0, ix_begin), min(ix_end, len(infos))):
zf.extract(infos[ix], directory)
zf.close()
directory = "path"
extract_chunk("{}/file.zip".format(directory), directory, 0, 50)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.