简体   繁体   English

如何从 python 中的 .tar 存档中提取特定文件?

[英]How to extract a specific file from the .tar archive in python?

I have created a.tar file on a Linux machine as follows:我在 Linux 机器上创建了一个 .tar 文件,如下所示:

tar cvf test.tar test_folder/

where the test_folder contains some files as shown below:其中 test_folder 包含一些文件,如下所示:

test_folder 
|___ file1.jpg
|___ file2.jpg
|___ ...

I am unable to programmatically extract the individual files within the tar archive using Python.我无法使用 Python 以编程方式提取 tar 存档中的各个文件。 More specifically, I have tried the following:更具体地说,我尝试了以下方法:

import tarfile
with tarfile.open('test.tar', 'r:') as tar:
    img_file = tar.extractfile('test_folder/file1.jpg')
    # img_file contains the object: <ExFileObject name='test_folder/test.tar'>

Here, the img_file does not seem to contain the requested image, but rather it contains the source .tar file.在这里, img_file似乎不包含请求的图像,而是包含源.tar文件。 I am not sure, where I am messing things up.我不确定,我在哪里搞砸了。 Any suggestions would be really helpful.任何建议都会非常有帮助。 Thanks in advance.提前致谢。

You probably wanted to use the .extract() method instead of your .extractfile() method (see my other answer ):您可能想使用.extract()方法而不是.extractfile()方法(请参阅我的其他答案):

import tarfile

with tarfile.open('test.tar', 'r:') as tar:
    tar.extract('test_folder/file1.jpg')         # .extract()  instead of .extractfile()

Notes:笔记:

  1. Your extracted file will be in the (maybe newly created) folder test_folder under your current directory.您提取的文件将位于当前目录下的(可能是新创建的)文件夹test_folder中。

  2. The .extract() method returns None , so there is no need to assign it ( img_file = tar.extract(...) ) .extract()方法返回None ,因此无需分配它( img_file = tar.extract(...)

Appending 2 lines to your code will solve your problem:将 2 行添加到您的代码将解决您的问题:

import tarfile

with tarfile.open('test.tar', 'r:') as tar:
    img_file = tar.extractfile('test_folder/file1.jpg')
    
    # --------------------- Add this ---------------------------
    with open ("img_file.jpg", "wb") as outfile:
        outfile.write(img_file.read())

The explanation:说明:

The .extractfile() method only provided you the content of the extracted file (ie its data ). .extractfile()方法只为您提供提取文件的内容(即其数据)。

It don't extract any file to the file system.不会将任何文件提取到文件系统中。

So you have do it yourself - by reading this returned content ( img_file.read() ) and writing it into a file of your choice ( outfile.write(...) ).所以你必须自己做 - 通过读取这个返回的内容( img_file.read() )并将其写入您选择的文件( outfile.write(...) )。


Or — to simplify your life — use the .extract() method instead.或者——为了简化你的生活——改用.extract()方法。 See my other answer .请参阅我的其他答案

This is because extractfile() returns a io.BufferReader object, so essentially you are extracting the file in your directory and storing the io.BufferReader in your variable.这是因为 extractfile() 返回一个 io.BufferReader object,所以基本上你是在你的目录中提取文件并将 io.BufferReader 存储在你的变量中

What you can do is, extract the file then open the file in a different content manager您可以做的是,提取文件然后在不同的内容管理器中打开文件

import tarfile
with tarfile.open('test.tar', 'r:') as tar:
    tar.extractfile('test_folder/file1.jpg')

with open('test_folder/file1.jpg','rb') as img:
    # do something with img. Here img is your img file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM