I have several tar archives that I need to extract/read in memory. The problem is each tar contains many ZIP archives and each contain unique XML documents.
So the structure of each tar is as follows: tar -> directories-> ZIPs->XML.
Obviously I can manually extract a single TAR but I have about 1000 TAR archives that are about 3 GB each and contains about 6000 ZIP archives each. I'm looking for a way to handle the .tar archives in memory and extract the XML data of each ZIP. Is there a way to do this?
This should be doable, since all of the relevant methods have non-disk-related options.
Lots of loops here, so let's dig in.
For each tar archive:
tarfile.open
would open the tar archive. ( Docs ) .getmembers
on the resulting TarFile
instance to get a list of the zips (or other files) contained in the archive. ( Docs ) For each zip within the tar archive:
.extractfile
on your TarFile
instance to get a file object for that zip. ( Docs ) zipfile.ZipFile
with your file object in order to open the zip so you can work with it. ( Docs ) .infolist
on your ZipFile
instance to get a list of the files it contains (including your XML files). ( Docs ) For each XML file within the zip:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.