简体   繁体   中英

Compressing only the files inside a given directory using tarfile (Python)

I have written the following script that allows me to compress a src (which can be either a single file or a directory) to target 'dst':

#!/usr/bin/env python2

import tarfile
from ntpath import basename, dirname
from os import path, listdir, makedirs, chdir
import errno
import sys

class Archivator:
    @staticmethod
    def compress(src='input/test', dst='output'):
        # if not path.isfile(src_file):
        #     print('Expecting absolute path to file (not directory) as "src". If "src" does contain a file, the file does not exist')
        #     return False

        if not path.isdir(dst):
            return False
            # try:
            #     makedirs(dst_dir)
            # except OSError as err:
            #     if err.errno != errno.EEXIST:
            #         return False

        filename = basename(src) if path.isdir(src) else src
        tar_file = dst + '/' + filename + '.tar.gz'
        print(tar_file)
        print(src)
        with tarfile.open(tar_file, 'w:gz') as tar:
            print('Creating archive "' + tar_file + '"')
            # chdir(dirname(dst_dir))
            recr = path.isdir(src)
            if recr:
                print('Source is a directory. Will compress all contents using recursion')
            tar.add(src, recursive=recr)

        return True


if __name__ == '__main__':
    import argparse

    parser = argparse.ArgumentParser(description='Uses tar to compress file')
    parser.add_argument('-src', '--source', type=str,
                        help='Absolute path to file (not directory) that will be compressed')
    parser.add_argument('-dst', '--destination', type=str, default='output/',
                        help='Path to output directory. Create archive inside the directory will have the same name as value of "--src" argument')

    # Generate configuration
    config = parser.parse_args()

    Archivator.compress(config.source, config.destination)

For single files I haven't had an issue so far. However while the compression of src (as a directory) does worked (recursion and all) I have noticed a very annoying issue namely that the complete directory structure is replicated inside the tar.gz archive.

Example:

Let's say I have the following file structure:

./
 |---compression.py (script above)
 |
 |---updates/
 |       |
 |       |---package1/
 |               |
 |               |---file1
 |               |---file2
 |               |---dir/
 |                     |
 |                     |---file3
 |
 |---compressed/

with src = 'updates/package1' and dst = 'compressed' I am expecting that the resulting archive will

  • be placed inside dst (this works)
  • contain file1 and file2

About the second point I expect

./
 |---compression.py (script above)
 |
 |---updates/
 |       |
 |       |---package1/
 |               |
 |               |---file1
 |               |---file2
 |               |---dir/
 |                    |
 |                    |---file3
 |
 |---compressed/
          |
          |---package1.tar.gz
                 |
                 |---file1
                 |---file2
                 |---dir/
                      |
                      |---file3

but instead I get

./
 |---compression.py (script above)
 |
 |---updates/
 |       |
 |       |---package1/
 |               |
 |               |---file1
 |               |---file2
 |               |---dir/
 |                    |
 |                    |---file3
 |
 |---compressed/
         |
         |---package1.tar.gz
                 |
                 |---updates/
                        |
                        |---package1/
                                |
                                |---file1
                                |---file2
                                |---dir/
                                     |
                                     |---file3

While the solution might be really trivial I seem to not be able to figure it out. I even tried chdir -ing inside the src (if a directory) but it didn't work. Some of my experiments even led to OSError (due to missing directory where it was expected to be present) and a corrupted archive.

First, you are using parameter recursive wrongly.

According to the official document of tarfile :

def add(self, name, arcname=None, recursive=True, exclude=None):
    """Add the file `name' to the archive. `name' may be any type of file
       (directory, fifo, symbolic link, etc.). If given, `arcname'
       specifies an alternative name for the file in the archive.
       Directories are added recursively by default. This can be avoided by
       setting `recursive' to False. `exclude' is a function that should
       return True for each filename to be excluded.
    """

You can use arcname to specify the alternative name in the archive. And recursive is used to control if creates directories recursively.

tarfile can directly add a directory.

Back to your question, you can manually add each files and specify their arcname . For example, tar.add("updates/package1/file1", "file1") .

update

Or you can set arcname to an empty string. As it will omit the root directory.

I basically used .replace to remove the base folder path with arcname .

        with tarfile.open(tar_path, tar_compression) as tar_handle:
            for root, dirs, files in os.walk(test_data_path):
                for file in files:
                    tar_handle.add(os.path.join(root, file), arcname=os.path.join(root, file).replace(test_data_path, ""))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM