简体繁体中英

Go through tar archive in memory to extract metadata?

原文 2015-06-10 17:10:14 4 1 python/ zip/ tar

I have several tar archives that I need to extract/read in memory. The problem is each tar contains many ZIP archives and each contain unique XML documents.

So the structure of each tar is as follows: tar -> directories-> ZIPs->XML.

Obviously I can manually extract a single TAR but I have about 1000 TAR archives that are about 3 GB each and contains about 6000 ZIP archives each. I'm looking for a way to handle the .tar archives in memory and extract the XML data of each ZIP. Is there a way to do this?

1 answers

This should be doable, since all of the relevant methods have non-disk-related options.

Lots of loops here, so let's dig in.

For each tar archive:

tarfile.open would open the tar archive. ( Docs )
Call .getmembers on the resulting TarFile instance to get a list of the zips (or other files) contained in the archive. ( Docs )

For each zip within the tar archive:

Once you know what member file (ie, one of your zips) you want to look through, call .extractfile on your TarFile instance to get a file object for that zip. ( Docs )
Instantiate a new zipfile.ZipFile with your file object in order to open the zip so you can work with it. ( Docs )
Call .infolist on your ZipFile instance to get a list of the files it contains (including your XML files). ( Docs )

For each XML file within the zip:

Call .open on your ZipFile instance in order to get a file object of one of your XML files. ( Docs )
You now have a file object corresponding to one of your XML files. Do whatever you want with it: .read it, copy it to disk somewhere, stick it in an ElementTree ( docs ), etc.

Efficiently extract single file from .tar-archive

Extract tar.gz archive into different folders (due to limitations)

How to extract a specific file from the .tar archive in python?

Python 3: Extract tar.gz archive without writing to disk

Extract lzma compressed tar archive members without writing to disk

Python 3: extract files from tar.gz archive

Extract Tar File inside Memory Filesystem

How to extract a single file from a tar.gz archive with its URL in python

Check tar archive before extractall

Write data directly to a tar archive

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Efficiently extract single file from .tar-archive Extract tar.gz archive into different folders (due to limitations) How to extract a specific file from the .tar archive in python? Python 3: Extract tar.gz archive without writing to disk Extract lzma compressed tar archive members without writing to disk Python 3: extract files from tar.gz archive Extract Tar File inside Memory Filesystem How to extract a single file from a tar.gz archive with its URL in python Check tar archive before extractall Write data directly to a tar archive

Related Tags

Go through tar archive in memory to extract metadata?

Question

1 answers

solution1 0 ACCPTED 2015-06-10 17:27:07

solution1
0 ACCPTED 2015-06-10 17:27:07