简体   繁体   English

为什么文件内容没有复制到我的tarfile中

[英]Why aren't file contents getting copied into my tarfile

Here's some code meant to copy the contents of a zipfile to a tarfile. 这是一些用于将zipfile的内容复制到tarfile的代码。 I intend later to limit the copying to files that appear within a list that's passed in as a further argument, but for now, I'm just trying to get copying work. 稍后,我打算将复制限制为出现在作为进一步参数传入的列表中的文件,但是现在,我只是在尝试进行复制工作。

import zipfile, tempfile, shutil, tarfile, os

def gather_and_repackage_files(zip_file_path, target_file_path) :
    with tarfile.open(target_file_path, "w") as tar:
        with zipfile.ZipFile(zip_file_path) as zip_file:
            for member in zip_file.namelist():
                filename = os.path.basename(member)
                # skip directories
                if not filename:
                    continue

                print "File: ", filename
                # copy file (taken from zipfile's extract)
                source = zip_file.open(member)
                with tempfile.NamedTemporaryFile(delete=False) as temp:
                    print temp.name
                    shutil.copyfileobj(source, temp)
                    tar.add(temp.name, arcname=filename)


gather_and_repackage_files("./stuff.zip", "./tarfile.tar")

Before I run this, the contents of my directory are "testin.py" (the program above) and "stuff.zip". 在运行此命令之前,目录的内容是“ testin.py”(上面的程序)和“ stuff.zip”。 "stuff.zip" is a zipfile containing two tiny text files, a.txt and b.txt, each of which contains about 15 characters. “ stuff.zip”是一个zipfile,其中包含两个小文本文件a.txt和b.txt,每个文本文件包含大约15个字符。 Apparently it also contains mac-backups of these, "_a.txt" and "_b.txt" as well (although when I expand it with the Archive utility, those do not appear, even with "ls -al"). 显然,它也包含这些Mac备份,“ _ a.txt”和“ _b.txt”(尽管当我使用Archive Utility对其进行扩展时,即使使用“ ls -al”也不会出现)。

After execution (Python 2.7.10), there's an additional file "tarfile.tar"; 执行之后(Python 2.7.10),还有一个附加文件“ tarfile.tar”; when I open this with the Archive utility on my Mac, I see this: 当我在Mac上使用“存档”实用程序打开此文件时,会看到以下内容:

drwx------  6 jfh  staff  204 Oct 29 16:51 .
drwxr-xr-x  7 jfh  staff  238 Oct 29 16:51 ..
-rw-------  1 jfh  staff    0 Oct 29 16:50 ._a.txt
-rw-------  1 jfh  staff    0 Oct 29 16:50 ._b.txt
-rw-------  1 jfh  staff    0 Oct 29 16:50 a.txt
-rw-------  1 jfh  staff    0 Oct 29 16:50 b.txt

The temporary files created during execution actually DO contain the 15 or so characters of silly text, but the ones in the tarfile are zero-length. 在执行过程中创建的临时文件实际上确实包含15个左右的愚蠢字符,但是tarfile中的字符长度为零。

So my question is "Why does the tar-file contain 0-length versions of a.txt and b.txt?" 所以我的问题是“为什么tar文件包含长度为0的a.txt和b.txt版本?”

The temp file may not have been completely flushed. 临时文件可能尚未完全刷新。

You could try to: temp.flush() os.fsync() 您可以尝试:temp.flush()os.fsync()

But of course it would be better not to create the temp file in the first place. 但是当然最好不要首先创建临时文件。 Which you can avoid by using tar.addfile instead of tar.add . 您可以通过使用tar.addfile而不是tar.add来避免tar.add

You also need to set the size of the tarinfo that you provide. 您还需要设置所提供的tarinfo的大小。

Note: you could also set mtime to preserve the time. 注意:您还可以设置mtime来保留时间。

This modification should do it: 此修改应做到:

import zipfile
import tarfile
import os

def gather_and_repackage_files(zip_file_path, target_file_path) :
    with tarfile.open(target_file_path, "w") as tar:
        with zipfile.ZipFile(zip_file_path) as zip_file:
            for info in zip_file.infolist():
                filename = os.path.basename(info.filename)
                # skip directories
                if not filename:
                    continue

                # copy file (taken from zipfile's extract)
                with zip_file.open(info) as source:
                  tarinfo = tarfile.TarInfo(filename)
                  tarinfo.size = info.file_size
                  tar.addfile(tarinfo, source)


gather_and_repackage_files("./stuff.zip", "./tarfile.tar")

Here is working code: 这是工作代码:

import zipfile, tempfile, shutil, tarfile, os

def gather_and_repackage_files(zip_file_path, target_file_path) :
    with tarfile.open(target_file_path, "w") as tar:
        with zipfile.ZipFile(zip_file_path) as zip_file:
            for member in zip_file.namelist():
                filename = os.path.basename(member)
                # skip directories
                if not filename:
                    continue

                print "File: ", filename
                print "Member: ", member
                source = zip_file.open(member)
                with tempfile.NamedTemporaryFile(delete=False) as temp:
                    print temp.name

                    shutil.copyfileobj(source, temp)

                    temp.close()
                    tar.add(temp.name, arcname=filename)

The secret sauce is in 'temp.close()', one line before the end. 秘诀是在“ temp.close()”中,即结束前一行。 It turns out that that you can't add an open file to a tar archive (although the documentation doesn't seem to mention that). 事实证明,您无法将打开的文件添加到tar归档文件中(尽管文档中似乎没有提及)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM