如何从 a.gz 存档中获取原始文件的名称？

Question

I'm writing a utility that takes a.gz archive and checks if it's contents already exist in a specified folder.我正在编写一个实用程序，它采用 a.gz 存档并检查它的内容是否已存在于指定文件夹中。 If they don't, it will extract the archive there.如果他们不这样做，它将在那里提取存档。

The way I planned on doing this is reading the filenames of the files in the.gz archive one by one and checking if such a file already exists in my directory.我计划这样做的方法是一个一个地读取 .gz 存档中文件的文件名，并检查我的目录中是否已经存在这样的文件。 But from what I understand this isn't possible with gzip.但据我了解，这对于 gzip 是不可能的。

Ideally, I'm looking for something like this:理想情况下，我正在寻找这样的东西：

archive = gzipfile.GzipFile(source)

    for i in archive.getmembers():
        if os.path.isfile(destination + sep + i.name) and overwrite:
        ...

Is this possible?这可能吗？

Answer 1

A .gz file is not an archive, it simply is compressed. .gz文件不是存档，它只是被压缩了。 If you have a .tar.gz file, you can use tarfile . 如果您有.tar.gz文件，则可以使用tarfile 。

Answer 2

While it is true that the .gz file is simply a compressed file, the original file name can be truncated or indeed the compressed .gz file can be renamed. 虽然确实是.gz文件只是一个压缩文件，但原始文件名可以被截断，或者确实可以重命名压缩的.gz文件。 gunzip can be told to provide the original file name using the -N flag and if used with the -l (minus lowercase L) it will tell you the original file name without uncompressing the file. 可以使用-N标志告知gunzip提供原始文件名，如果与-l （减小写L）一起使用，它将在不解压缩文件的情况下告诉您原始文件名。
For example: 例如：

$ gzip sometext.txt
$ mv sometext.txt.gz othertext.gz
$ gunzip -Nl othertext.gz
         compressed        uncompressed  ratio uncompressed_name
                 58                 113  76.1% sometext.txt

You can hack your way through this in python as well. 您也可以在python中通过这种方式来破解。

from subprocess import check_output
size_name = check_output(['gunzip', '-Nlq','othertext.gz'])
size_name = size_name.strip().split("%",1)
print "original filename =",size_name[1].strip()

result: 结果：

original filename = sometext.txt

I do not believe that the python gzip package allows you to access the original file name. 我不相信python gzip包允许您访问原始文件名。
Someone else may know different! 其他人可能知道不同！

Answer 3

Adding to the accepted answer:添加到已接受的答案：

At least CPython's gzip does not expose the file name metadata because it simply discards it as you can see when you check the source code.至少 CPython 的gzip不会公开文件名元数据，因为它只是将其丢弃，正如您在检查源代码时所看到的那样。

However, the gzip file format (specified in RFC 1952 ) or at least its metadata is easy enough to manually parse:但是，gzip 文件格式（在RFC 1952中指定）或至少它的元数据很容易手动解析：

import struct

def getGzipName(path):
    with open(path, 'rb') as file:
        id1, id2, compression, flags, mtime, extraFlags, osId = struct.unpack('<BBBBLBB', file.read(10))
        if id1 != 0x1F or id2 != 0x8B or compression != 0x08:
            return None

        # Extra Field (e.g. used by bgzip to store the length of the compressed block)
        if flags & ( 1 << 2 ) != 0:
            file.read(struct.unpack('<U', file.read(2))[0])

        # File Name Field
        if flags & ( 1 << 3 ) != 0:
            name = b''
            c = file.read(1)
            while c != b'\0':
                name += c
                c = file.read(1)
            return name.decode()

    return None

Note that theoretically gzip could be used as an archive format because it does support storing original file names, which might be used to store paths, and because multiple gzip streams (all with different file names) are allowed to be concatenated to each other.请注意，理论上gzip 可以用作存档格式，因为它确实支持存储原始文件名，这可能用于存储路径，并且因为允许多个 gzip 流（所有文件名都不同）相互连接。 However, not even the gzip tool does support such exotic gzip files not even with the --name option.然而，即使使用--name选项， gzip工具也不支持这种奇异的 gzip 文件。 It will simply concatenate the data of the second gzip stream to the original filename of the first gzip stream.它会简单地将第二个 gzip stream 的数据连接到第一个 gzip stream 的原始文件名。

Answer 4

import tarfile

archive = tarfile.open(source)
for i in archive.getmembers():
    ...

如何从 a.gz 存档中获取原始文件的名称？

问题描述

4 个解决方案

解决方案1
3

解决方案2
2 已采纳 2016-02-17 16:26:33

解决方案3
2 2022-04-03 18:38:15

解决方案4
-1 2016-02-17 15:09:53

如何从 a.gz 存档中获取原始文件的名称？

问题描述

4 个解决方案

解决方案1 3

解决方案2 2 已采纳 2016-02-17 16:26:33

解决方案3 2 2022-04-03 18:38:15

解决方案4 -1 2016-02-17 15:09:53

解决方案1
3

解决方案2
2 已采纳 2016-02-17 16:26:33

解决方案3
2 2022-04-03 18:38:15

解决方案4
-1 2016-02-17 15:09:53