简体   繁体   English

如何从 python 中的 tarfile 将文件提取到不同的目标文件名?

[英]How can I extract files to a different destination filename from tarfile in python?

I have a tarfile.TarFile from which I would like to extract some files to a modified destination filename;我有一个tarfile.TarFile ,我想从中提取一些文件到修改后的目标文件名; there is an existing file with the same name as the archive member that I do not want to touch.有一个与存档成员同名的现有文件,我不想触碰。 Specifically, I want to append a suffix, eg a member in the archive called foo/bar.txt should be extracted as foo/bar.txt.mysuffix .具体来说,我想 append 一个后缀,例如归档中名为foo/bar.txt的成员应提取为foo/bar.txt.mysuffix

The two somewhat obvious but also somewhat unsatisfactory approaches are:两种有些明显但也有些不令人满意的方法是:

  • extract each file using extractfile , create renamed file and copy content using shutil.copyfileobj ;使用extractfile提取每个文件,创建重命名文件并使用shutil.copyfileobj复制内容; however, this is either limited to regular files or all the special handling, eg for sparse files, symlinks, directories etc. implemented in tarfile would have to be replicated.但是,这仅限于常规文件或所有特殊处理,例如对于稀疏文件、符号链接、目录等,在tarfile中实现的文件必须被复制。
  • extractall to a temporary directory and then rename and copy to destination; extractall到一个临时目录,然后重命名并复制到目标; this just feels unnecessarily convoluted, requires more interaction with the host system and introduces new failure modes, and it seems easy to get this subtly wrong (eg see warnings on shutil.copy/copy2 ).这只是感觉不必要的复杂,需要与主机系统进行更多交互并引入新的故障模式,而且似乎很容易犯这种微妙的错误(例如,请参阅shutil.copy/copy2上的警告)。

Is there no interface or hook on the TarFile that would allow to implement this concisely and correctly? TarFile上是否没有接口或挂钩可以简洁正确地实现这一点?

There is TarFile.getmembers() method which returns the members of the archive as a list.有 TarFile.getmembers() 方法将存档的成员作为列表返回。 And there you can loop and choose what files you want to extract or not.Depending on the size of your tar,your second approach can be viable too but not the best.在那里您可以循环并选择要提取或不提取的文件。根据 tar 的大小,您的第二种方法也可行,但不是最好的。

object = tarfile.open('example.tar', 'r')
for member in object.getmembers():
    if "whatever" in member.name:
        object.extract(member, "example_dir")

Looking throughLib/tarfile.py , I came across this comment :浏览Lib/tarfile.py ,我遇到了这个评论

    #--------------------------------------------------------------------------
    # Below are the different file methods. They are called via
    # _extract_member() when extract() is called. They can be replaced in a
    # subclass to implement other functionality.

    def makedir(self, tarinfo, targetpath):
       #...
    
    def makefile(self, tarinfo, targetpath):
       # ...

These methods are not mentioned in the official reference documentation, but they appear to be fair game.这些方法在官方参考文档中没有提到,但它们似乎是公平的游戏。 To overwrite these on an existing open TarFile instance, we can create a subclass Facade/Wrapper:要在现有的打开的TarFile实例上覆盖这些,我们可以创建一个子类 Facade/Wrapper:

class SuffixingTarFile(tarfile.TarFile):
    def __init__(self, suffix: str, wrapped: tarfile.TarFile):
        self.suffix = suffix
        self.wrapped = wrapped

    def __getattr__(self, attr):
        return getattr(self.wrapped, attr)

    def makefile(self, tarinfo, targetpath):
        super().makefile(tarinfo, targetpath + self.suffix)

    # overwrite makedir, makelink, makefifo, etc. as desired

Example:例子:

tar = tarfile.open(...)
star = SuffixingTarFile(".foo", tar)
star.extractall()  # extracts all (regular) file members with .foo suffix appended

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在不添加目录层次结构的情况下使用 Python 将文件添加到 tarfile? - How can files be added to a tarfile with Python, without adding the directory hierarchy? 如何从文件名存储在 python 列表中的特定文件夹中提取文件? - How to extract files from a particular folder with filename stored in a python list? 如何使用Python多处理池处理tarfile? - How can I process a tarfile with a Python multiprocessing pool? 使用 python 解压文件并使用 tarfile.extractall() python 提取某些文件 - Untar files with python and extract certain files with tarfile.extractall() python 在tarfile中区分来自不同驱动器的文件 - Discriminate files from different drives in tarfile 无法使用python“ tarfile.ReadError:文件无法成功打开”提取.xz文件 - Can't extract .xz files with python “tarfile.ReadError: file could not be opened successfully” Python3 - 如何将 tarfile 写入不同的目录? - Python3 - how to write tarfile to a different directory? 如何从 tarfile 流式传输文件以进行读取? - How to stream files from tarfile for reading? 如何从文件名中提取艺术家姓名和歌曲名称? - How can I extract artist name and song name from filename? Python tarfile和zipfile生成具有不同MD5的档案,用于2个相同的文件 - Python tarfile and zipfile producing archives with different MD5 for 2 identical files
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM