简体   繁体   English

如何动态提取`*.tar.gz`的内容

[英]How to extract the content of `*.tar.gz` dynamically

I have quite big *.tar.gz file (10Gb) that contains individuals files (no sub-folders).我有相当大的*.tar.gz文件(10Gb),其中包含个人文件(没有子文件夹)。 In Jupyter Notebook it takes several hours to untar this archive.在 Jupyter Notebook 中,解压缩此存档需要几个小时。 Once all files are extracted, I need to upload them into a storage location.提取所有文件后,我需要将它们上传到存储位置。

This is what I currently have:这是我目前拥有的:

untar = tarfile.TarFile(tarfilename)
untar.extractall()
untar.close()

Is it possible to extract the content of *.tar.gz dynamically (ie continuously)?是否可以动态(即连续)提取*.tar.gz的内容? Something like this:像这样的东西:

with open(tarfilename, "r") as tararchive:
   for eachfile in tararchive:
       save_to_storage_location(eachfile)

So, instead of waiting until the tar archive is untarred, I just want to "open" it and move all the content one by one into the storage location.因此,与其等到tar存档解压缩,我只想“打开”它并将所有内容一一移动到存储位置。

A little of tinkering with the package has shown me you can list all the files under the tar file and you could extract them individually.对 package 稍作修改后,您可以列出 tar 文件下的所有文件,并且可以单独提取它们。 Without more information on what you want to do with the files afterwards (ie where or how you want to upload them) I cannot help on that end.如果没有更多关于您以后想对文件做什么的信息(即您想在哪里或如何上传它们),我对此无能为力。

You can cycle through your files like so:您可以像这样循环浏览文件:

import tarfile

with tarfile.open("path/to/file.tar.gz", "r") as file:
    for each in file.getnames():
        print(each)
        file.extract(each)

At the last stage the individual file has been extracted and will be sitting in your current working directory, you can hence do something with it在最后阶段,单个文件已被提取并将位于您当前的工作目录中,因此您可以对其进行处理

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM