使用Python直接从zip文件中读取xml文件

Question

I have a following zip file structure: 我有以下zip文件结构：

some_file.zip/folder/folder/files.xml some_file.zip/folder/folder/files.xml

So I have a lot of xml files within a subfolder of the zip file. 因此，我在zip文件的子文件夹中有很多xml文件。

So far I have managed to unpack the zip file using the following code: 到目前为止，我已经成功使用以下代码解压缩了zip文件：

import os.path
import zipfile

with zipfile.ZipFile('some_file.zip') as zf:
    for member in zf.infolist():
        # Path traversal defense copied from
        # http://hg.python.org/cpython/file/tip/Lib/http/server.py#l789
        words = member.filename.split('/')
        path = "output"
        for word in words[:-1]:
            drive, word = os.path.splitdrive(word)
            head, word = os.path.split(word)
            if word in (os.curdir, os.pardir, ''): continue
            path = os.path.join(path, word)

        zf.extract(member, path)

But I do not need to extract the files but to read them directly from the zip file. 但是我不需要提取文件，而是直接从zip文件读取它们。 So either read each file within a for loop and process it or to save each file in some kind of data structure in Python. 因此，要么在for循环中读取每个文件并进行处理，要么在Python中将每个文件保存为某种数据结构。 Is it possible? 可能吗？

Answer 1

zf.open（）将返回一个类似于object的文件，而无需提取它。

Answer 2

as Robin Davis has written zf.open() will do the trick. 正如罗宾·戴维斯（Robin Davis）编写的zf.open（）一样。 Here is a small example: 这是一个小例子：

import zipfile

zf = zipfile.ZipFile('some_file.zip', 'r')

for name in zf.namelist():
    if name.endswith('/'): continue

    if 'folder2/' in name:
        f = zf.open(name)
        # here you do your magic with [f] : parsing, etc.
        # this will print out file contents
        print(f.read())

As OP wished in comment only files from the "folder2" will be processed... 正如OP在注释中希望的那样，将仅处理“ folder2”中的文件...

使用Python直接从zip文件中读取xml文件

问题描述

2 个解决方案

解决方案1
3 2016-02-14 19:07:05

解决方案2
3 已采纳 2016-02-14 19:28:03

使用Python直接从zip文件中读取xml文件

问题描述

2 个解决方案

解决方案1 3 2016-02-14 19:07:05

解决方案2 3 已采纳 2016-02-14 19:28:03

解决方案1
3 2016-02-14 19:07:05

解决方案2
3 已采纳 2016-02-14 19:28:03