简体   繁体   中英

Is it possible to extract single file from tar bundle in python

I need to fetch a couple of files from a huge svn repo. Whole repo takes almost an hour to be fetched. Files I am looking for are part of tar bundle.

Is it possible to fetch only those two files from tar bundle without extracting the whole bundle through Python Code?

If so, can anybody let me know how should I go about it?

Perhaps you want something like this?

#!/usr/local/cpython-3.3/bin/python

import tarfile as tarfile_mod

def main():
    tarfile = tarfile_mod.TarFile('tar-archive.tar', 'r')
    if False:
        file_ = tarfile.extractfile('etc/protocols')
        print(file_.read())
    else:
        tarfile.extract('etc/protocols')
    tarfile.close()

main()

Here is one way to get a tar file from svn and extract one file from it all:

import tarfile
from subprocess import check_output
# Capture the tar file from subversion
tmp='/home/me/tempfile.tar'
open(tmp, 'wb').write(check_output(["svn", "cat", "svn://url/some.tar"]))
# Extract the file we want, saving to current directory
tarfile.open(tmp).extract('dir1/fname.ext', path='dir2')

where 'dir1/fname.ext' is the full path to the file that you want within the tar archive. It will be saved in 'dir2/dir1/fname.ext'. If you omit the path argument, it will be saved in 'dir1/fname.ext' under the current directory.

The above can be understood as follows. On a normal shell command line, svn cat url tells subversion to send the file defined by url to stdout (see svn help cat for more info). url can be any type of url that svn understands such as svn://... , svn+ssh://... , or file://... . We run this command under python control using the subprocess module. To do this the svn cat url command is broken up into a list: ["svn", "cat", "url"] . The output from this svn command is saved to a local file defined by the tmp variable. We then use the tarfile module to extract the file you want.

Alternatively, you could use the extractfile method to capture the file data to a python variable:

handle = t.extractfile('dir1/fname.ext')
print handle.readlines() # show file contents

According to the documentation, tarfile should accept a subprocess's stdout as a file handle. This would simplify the code and eliminate the need to save the tar file locally. However, due to a bug, Issue 10436 , that will not work.

It sounds like you have two parts to your question:

  1. Fetching a single tar bundle from the SVN repo, without the rest of the repo's files.
  2. Using Python to extract two files from the retrieved bundle.

For the first part, I'll simply refer to this post on svn export and sparse checkouts.

For the second part, here is a solution for extracting the two files from the retrieved tarball:

import tarfile

files_i_want = ['path/to/file1','path/to/file2']

tar = tarfile.open("bundle.tar")
tar.extractall(members=[x for x in tar.getmembers() if x.name in files_i_want])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM