简体   繁体   中英

Python: how to create tar file and compress it on the fly with external module, using different compression methods not available in tarfile module?

I'm trying to set up a code to pack a few big files (from tens to hundreds of gigabytes) into one archive. The compression methods that supported in tarfile module are a bit slow for such a big amount of data, so I would like to use some external compress module like lz4 to achive better speed of compression. Unfortunately I can't find a way how to create tar file and compress it with lz4 on the fly to avoid creating temporary tar file. The documentation of tarfile module says that there's a way to open an uncompressed stream for writing using 'w|' mode. Is it the way to stream tar file directly to lz4 module? If so, what's the proper way to use it? Thank you very much.

Per our conversation above.

import tarfile
import subprocess

p = subprocess.Popen(['lz4', '-'], stdin=subprocess.PIPE)

tar = tarfile.open(fileobj=p.stdin, mode="w|")

From there you can do the usual tar.addfile . FYI: as I stated in the conversation. GNU tar can auto detect gz and bz2 but not lz4. Just a note. So you have to do lz4 -c -d stdin.lz4 | tar xf - lz4 -c -d stdin.lz4 | tar xf - to extract files. If you simply did tar xf it would fail.

You can pipe the result of the tar command directly to the lz4 utility. This will avoid usage of any intermediate file. Here is an example (assuming you have both tar and lz4 installed on your system) :

tar cvf - * | lz4 > mypack.tar.lz4

The - here tells to output the result from tar to stdout . Of course, you can change the * with whichever target you want to tar.

The reverse operation is also possible :

lz4 -d mypack.tar.lz4 | tar xv

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM