简体   繁体   English

使用Python直接从zip文件中读取xml文件

[英]Read xml files directly from a zip file using Python

I have a following zip file structure: 我有以下zip文件结构:

some_file.zip/folder/folder/files.xml some_file.zip/folder/folder/files.xml

So I have a lot of xml files within a subfolder of the zip file. 因此,我在zip文件的子文件夹中有很多xml文件。

So far I have managed to unpack the zip file using the following code: 到目前为止,我已经成功使用以下代码解压缩了zip文件:

import os.path
import zipfile

with zipfile.ZipFile('some_file.zip') as zf:
    for member in zf.infolist():
        # Path traversal defense copied from
        # http://hg.python.org/cpython/file/tip/Lib/http/server.py#l789
        words = member.filename.split('/')
        path = "output"
        for word in words[:-1]:
            drive, word = os.path.splitdrive(word)
            head, word = os.path.split(word)
            if word in (os.curdir, os.pardir, ''): continue
            path = os.path.join(path, word)

        zf.extract(member, path)

But I do not need to extract the files but to read them directly from the zip file. 但是我不需要提取文件,而是直接从zip文件读取它们。 So either read each file within a for loop and process it or to save each file in some kind of data structure in Python. 因此,要么在for循环中读取每个文件并进行处理,要么在Python中将每个文件保存为某种数据结构。 Is it possible? 可能吗?

zf.open()将返回一个类似于object的文件,而无需提取它。

as Robin Davis has written zf.open() will do the trick. 正如罗宾·戴维斯(Robin Davis)编写的zf.open()一样。 Here is a small example: 这是一个小例子:

import zipfile

zf = zipfile.ZipFile('some_file.zip', 'r')

for name in zf.namelist():
    if name.endswith('/'): continue

    if 'folder2/' in name:
        f = zf.open(name)
        # here you do your magic with [f] : parsing, etc.
        # this will print out file contents
        print(f.read()) 

As OP wished in comment only files from the "folder2" will be processed... 正如OP在注释中希望的那样,将仅处理“ folder2”中的文件...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM