简体   繁体   English

如何使用Python通过HTTP读取远程Zip存档中的选定文件?

[英]How do I read selected files from a remote Zip archive over HTTP using Python?

I need to read selected files, matching on the file name, from a remote zip archive using Python. 我需要使用Python从远程zip存档中读取与文件名匹配的所选文件。 I don't want to save the full zip to a temporary file (it's not that large, so I can handle everything in memory). 我不想将完整的zip保存到临时文件中(它不是那么大,所以我可以处理内存中的所有内容)。

I've already written the code and it works, and I'm answering this myself so I can search for it later. 我已经编写了代码并且它可以工作,我自己也在回答这个问题,所以我可以在以后搜索它。 But since evidence suggests that I'm one of the dumber participants on Stackoverflow, I'm sure there's room for improvement. 但由于有证据表明我是Stackoverflow的笨蛋参与者之一,我相信还有改进的余地。

Here's how I did it (grabbing all files ending in ".ranks"): 这是我如何做到的(抓取所有以“.ranks”结尾的文件):

import urllib2, cStringIO, zipfile

try:
    remotezip = urllib2.urlopen(url)
    zipinmemory = cStringIO.StringIO(remotezip.read())
    zip = zipfile.ZipFile(zipinmemory)
    for fn in zip.namelist():
        if fn.endswith(".ranks"):
            ranks_data = zip.read(fn)
            for line in ranks_data.split("\n"):
                # do something with each line
except urllib2.HTTPError:
    # handle exception

This will do the job without downloading the entire zip file! 这将完成工作而无需下载整个zip文件!

http://pypi.python.org/pypi/pyremotezip http://pypi.python.org/pypi/pyremotezip

Thanks Marcel for your question and answer (I had the same problem in a different context and encountered the same difficulty with file-like objects not really being file-like)! 感谢Marcel提出的问题和答案(我在不同的上下文中遇到了同样的问题,并且遇到了与文件类对象相同的难度,而不是像文件一样)! Just as an update: For Python 3.0, your code needs to be modified slightly: 就像更新一样:对于Python 3.0,您的代码需要稍微修改:

import urllib.request, io, zipfile

try:
    remotezip = urllib.request.urlopen(url)
    zipinmemory = io.BytesIO(remotezip.read())
    zip = zipfile.ZipFile(zipinmemory)
    for fn in zip.namelist():
        if fn.endswith(".ranks"):
            ranks_data = zip.read(fn)
            for line in ranks_data.split("\n"):
                # do something with each line
except urllib.request.HTTPError:
    # handle exception

请记住,仅解压缩ZIP文件可能会导致安全漏洞

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM