如何从Python中的zip文件中读取zip文件？

Question

I have a file that I want to read that is itself zipped within a zip archive. 我有一个我想要阅读的文件，它本身是在zip存档中压缩的。 For example, parent.zip contains child.zip, which contains child.txt. 例如，parent.zip包含child.zip，其中包含child.txt。 I am having trouble reading child.zip. 我在阅读child.zip时遇到了麻烦。 Can anyone correct my code? 谁能纠正我的代码？

I assume that I need to create child.zip as a file-like object and then open it with a second instance of zipfile, but being new to python my zipfile.ZipFile(zfile.open(name)) is silly. 我假设我需要创建一个类似文件的对象的child.zip，然后用第二个zipfile实例打开它，但是对于python我是新的zipfile.ZipFile（zfile.open（name））是愚蠢的。 It raises a zipfile.BadZipfile: "File is not a zip file" on (independently validated) child.zip 它引发了一个zipfile.BadZip文件：“文件不是一个zip文件”on（独立验证）child.zip

import zipfile
with zipfile.ZipFile("parent.zip", "r") as zfile:
    for name in zfile.namelist():
        if re.search(r'\.zip$', name) is not None:
            # We have a zip within a zip
            with **zipfile.ZipFile(zfile.open(name))** as zfile2:
                    for name2 in zfile2.namelist():
                        # Now we can extract
                        logging.info( "Found internal internal file: " + name2)
                        print "Processing code goes here"

Answer 1

When you use the .open() call on a ZipFile instance you indeed get an open file handle. 当您在ZipFile实例上使用.open()调用时，您确实获得了一个打开的文件句柄。 However, to read a zip file, the ZipFile class needs a little more. 但是，要读取 zip文件， ZipFile类需要更多。 It needs to be able to seek on that file, and the object returned by .open() is not seekable in your case. 它需要能够在该文件上进行搜索，并且.open()返回的对象在您的情况下是不可.open() 。 Only Python 3 (3.2 and up) produces a ZipExFile object that supports seeking (provided the underlying file handle for the outer zip file is seekable, and nothing is trying to write to the ZipFile object). 只有Python 3（3.2及更高版本）生成一个支持搜索的ZipExFile对象（前提是外部zip文件的底层文件句柄是可搜索的，并且没有任何东西试图写入ZipFile对象）。

The workaround is to read the whole zip entry into memory using .read() , store it in a BytesIO object (an in-memory file that is seekable) and feed that to ZipFile : 解决方法是使用读取整个拉链进入存储器.read()其存储在一个BytesIO对象（一个内存文件，它是可搜索）和饲料，为ZipFile ：

from io import BytesIO

# ...
        zfiledata = BytesIO(zfile.read(name))
        with zipfile.ZipFile(zfiledata) as zfile2:

or, in the context of your example: 或者，在您的示例中：

import zipfile
from io import BytesIO

with zipfile.ZipFile("parent.zip", "r") as zfile:
    for name in zfile.namelist():
        if re.search(r'\.zip$', name) is not None:
            # We have a zip within a zip
            zfiledata = BytesIO(zfile.read(name))
            with zipfile.ZipFile(zfiledata) as zfile2:
                for name2 in zfile2.namelist():
                    # Now we can extract
                    logging.info( "Found internal internal file: " + name2)
                    print "Processing code goes here"

Answer 2

To get this to work with python33 (under windows but that might be unrelevant) i had to do : 为了使这与python33一起工作（在windows下但可能不相关）我必须这样做：

 import zipfile, re, io
    with zipfile.ZipFile(file, 'r') as zfile:
        for name in zfile.namelist():
            if re.search(r'\.zip$', name) != None:
                zfiledata = io.BytesIO(zfile.read(name))
                with zipfile.ZipFile(zfiledata) as zfile2:
                    for name2 in zfile2.namelist():
                        print(name2)

cStringIO does not exist so i used io.BytesIO cStringIO不存在所以我使用了io.BytesIO

Answer 3

Here's a function I came up with. 这是我想出的一个功能。 (Copied from here .) （从这里复制。）

def extract_nested_zipfile(path, parent_zip=None):
    """Returns a ZipFile specified by path, even if the path contains
    intermediary ZipFiles.  For example, /root/gparent.zip/parent.zip/child.zip
    will return a ZipFile that represents child.zip
    """

    def extract_inner_zipfile(parent_zip, child_zip_path):
        """Returns a ZipFile specified by child_zip_path that exists inside
        parent_zip.
        """
        memory_zip = StringIO()
        memory_zip.write(parent_zip.open(child_zip_path).read())
        return zipfile.ZipFile(memory_zip)

    if ('.zip' + os.sep) in path:
        (parent_zip_path, child_zip_path) = os.path.relpath(path).split(
            '.zip' + os.sep, 1)
        parent_zip_path += '.zip'

        if not parent_zip:
            # This is the top-level, so read from disk
            parent_zip = zipfile.ZipFile(parent_zip_path)
        else:
            # We're already in a zip, so pull it out and recurse
            parent_zip = extract_inner_zipfile(parent_zip, parent_zip_path)

        return extract_nested_zipfile(child_zip_path, parent_zip)
    else:
        if parent_zip:
            return extract_inner_zipfile(parent_zip, path)
        else:
            # If there is no nesting, it's easy!
            return zipfile.ZipFile(path)

Here's how I tested it: 这是我测试它的方式：

echo hello world > hi.txt
zip wrap1.zip hi.txt
zip wrap2.zip wrap1.zip
zip wrap3.zip wrap2.zip

print extract_nested_zipfile('/Users/mattfaus/dev/dev-git/wrap1.zip').open('hi.txt').read()
print extract_nested_zipfile('/Users/mattfaus/dev/dev-git/wrap2.zip/wrap1.zip').open('hi.txt').read()
print extract_nested_zipfile('/Users/mattfaus/dev/dev-git/wrap3.zip/wrap2.zip/wrap1.zip').open('hi.txt').read()

如何从Python中的zip文件中读取zip文件？

问题描述

3 个解决方案

解决方案1
42 已采纳 2012-08-19 09:25:03

解决方案2
9 2013-11-20 14:57:29

解决方案3
0 2014-06-12 20:58:56

如何从Python中的zip文件中读取zip文件？

问题描述

3 个解决方案

解决方案1 42 已采纳 2012-08-19 09:25:03

解决方案2 9 2013-11-20 14:57:29

解决方案3 0 2014-06-12 20:58:56

解决方案1
42 已采纳 2012-08-19 09:25:03

解决方案2
9 2013-11-20 14:57:29

解决方案3
0 2014-06-12 20:58:56