简体   繁体   中英

How to extract a specific file from the .tar archive in python?

I have created a.tar file on a Linux machine as follows:

tar cvf test.tar test_folder/

where the test_folder contains some files as shown below:

test_folder 
|___ file1.jpg
|___ file2.jpg
|___ ...

I am unable to programmatically extract the individual files within the tar archive using Python. More specifically, I have tried the following:

import tarfile
with tarfile.open('test.tar', 'r:') as tar:
    img_file = tar.extractfile('test_folder/file1.jpg')
    # img_file contains the object: <ExFileObject name='test_folder/test.tar'>

Here, the img_file does not seem to contain the requested image, but rather it contains the source .tar file. I am not sure, where I am messing things up. Any suggestions would be really helpful. Thanks in advance.

You probably wanted to use the .extract() method instead of your .extractfile() method (see my other answer ):

import tarfile

with tarfile.open('test.tar', 'r:') as tar:
    tar.extract('test_folder/file1.jpg')         # .extract()  instead of .extractfile()

Notes:

  1. Your extracted file will be in the (maybe newly created) folder test_folder under your current directory.

  2. The .extract() method returns None , so there is no need to assign it ( img_file = tar.extract(...) )

Appending 2 lines to your code will solve your problem:

import tarfile

with tarfile.open('test.tar', 'r:') as tar:
    img_file = tar.extractfile('test_folder/file1.jpg')
    
    # --------------------- Add this ---------------------------
    with open ("img_file.jpg", "wb") as outfile:
        outfile.write(img_file.read())

The explanation:

The .extractfile() method only provided you the content of the extracted file (ie its data ).

It don't extract any file to the file system.

So you have do it yourself - by reading this returned content ( img_file.read() ) and writing it into a file of your choice ( outfile.write(...) ).


Or — to simplify your life — use the .extract() method instead. See my other answer .

This is because extractfile() returns a io.BufferReader object, so essentially you are extracting the file in your directory and storing the io.BufferReader in your variable.

What you can do is, extract the file then open the file in a different content manager

import tarfile
with tarfile.open('test.tar', 'r:') as tar:
    tar.extractfile('test_folder/file1.jpg')

with open('test_folder/file1.jpg','rb') as img:
    # do something with img. Here img is your img file

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM