简体   繁体   English

如何在python3中读取和列出tgz文件?

[英]How to read and list a tgz file in python3?

In python 3 (3.6.8) I want to read a gzipped tar file and list its content. 在python 3(3.6.8)中,我想读取gzip压缩的tar文件并列出其内容。

I found this solution which yields an error 我找到了产生错误的解决方案

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Searching for this error in found this suggestion so I tried the following code snippet: 在搜索此错误时发现了此建议,因此我尝试了以下代码段:

with open(out_file) as fd:
    gzip_fd = gzip.GzipFile(fileobj=fd)
    tar = tarfile.open(gzip_fd.read())

which yields the same error! 产生相同的错误!

So how to do it right? 那么怎么做对呢?

Even when looking at the actual documentation here I came up with the following code: 即使在查看实际文档时我也想出了以下代码:

tar = tarfile.open(out_file, "w:gz")
for member in tar.getnames():
   print(tar.extractfile(member).read())

which finally worked without errors - but did not print any content of the tar archive on the screen! 最终没有任何错误-但没有在屏幕上打印tar存档的任何内容!

The tar file is well formatted and contains folders and files. tar文件格式正确,包含文件夹和文件。 (I need to try to share this file) (我需要尝试共享此文件)

When you open a file without specifying mode it defaults to reading it as text. 当您open文件而不指定mode ,默认情况mode将其读取为文本。 You need to open the file as raw byte stream using mode='rb' flag then feed it to gzip reader 您需要使用mode='rb'标志将文件作为原始字节流打开,然后将其送入gzip阅读器

with open(out_file, mode='rb') as fd:
    gzip_fd = gzip.GzipFile(fileobj=fd)
    tar = tarfile.open(gzip_fd.read())

The python-archive module (available on pip) could help you: python-archive模块(可在pip上获得)可以帮助您:

from archive import extract

file = "you/file.tgz"
try:
    extract(file, "out/%s.raw" % (file), ext=".tgz")
except:
    # could not extract
    pass

Available extensions are (v0.2): '.zip', '.egg', '.jar', '.tar', '.tar.gz', '.tgz', '.tar.bz2', '.tz2' 可用的扩展名是(v0.2):'.zip','。egg','。jar','。tar','。tar.gz','。tgz','。tar.bz2','。 tz2'

More info: https://pypi.org/project/python-archive/ 更多信息: https//pypi.org/project/python-archive/

Not sure why it did not work before, but the following solution works for me in order to list the files and folders of a gzipped tar archive with python 3.6 : 不知道为什么以前不起作用,但是以下解决方案对我有用,以便使用python 3.6列出gzip压缩tar存档的文件和文件夹

tar = tarfile.open(filename, "r:gz")
print(tar.getnames())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM