如何使用Python通过HTTP读取远程Zip存档中的选定文件？

Question

I need to read selected files, matching on the file name, from a remote zip archive using Python. 我需要使用Python从远程zip存档中读取与文件名匹配的所选文件。 I don't want to save the full zip to a temporary file (it's not that large, so I can handle everything in memory). 我不想将完整的zip保存到临时文件中（它不是那么大，所以我可以处理内存中的所有内容）。

I've already written the code and it works, and I'm answering this myself so I can search for it later. 我已经编写了代码并且它可以工作，我自己也在回答这个问题，所以我可以在以后搜索它。 But since evidence suggests that I'm one of the dumber participants on Stackoverflow, I'm sure there's room for improvement. 但由于有证据表明我是Stackoverflow的笨蛋参与者之一，我相信还有改进的余地。

Answer 1

Here's how I did it (grabbing all files ending in ".ranks"): 这是我如何做到的（抓取所有以“.ranks”结尾的文件）：

import urllib2, cStringIO, zipfile

try:
    remotezip = urllib2.urlopen(url)
    zipinmemory = cStringIO.StringIO(remotezip.read())
    zip = zipfile.ZipFile(zipinmemory)
    for fn in zip.namelist():
        if fn.endswith(".ranks"):
            ranks_data = zip.read(fn)
            for line in ranks_data.split("\n"):
                # do something with each line
except urllib2.HTTPError:
    # handle exception

Answer 2

This will do the job without downloading the entire zip file! 这将完成工作而无需下载整个zip文件！

http://pypi.python.org/pypi/pyremotezip http://pypi.python.org/pypi/pyremotezip

Answer 3

Thanks Marcel for your question and answer (I had the same problem in a different context and encountered the same difficulty with file-like objects not really being file-like)! 感谢Marcel提出的问题和答案（我在不同的上下文中遇到了同样的问题，并且遇到了与文件类对象相同的难度，而不是像文件一样）！ Just as an update: For Python 3.0, your code needs to be modified slightly: 就像更新一样：对于Python 3.0，您的代码需要稍微修改：

import urllib.request, io, zipfile

try:
    remotezip = urllib.request.urlopen(url)
    zipinmemory = io.BytesIO(remotezip.read())
    zip = zipfile.ZipFile(zipinmemory)
    for fn in zip.namelist():
        if fn.endswith(".ranks"):
            ranks_data = zip.read(fn)
            for line in ranks_data.split("\n"):
                # do something with each line
except urllib.request.HTTPError:
    # handle exception

Answer 4

请记住，仅解压缩ZIP文件可能会导致安全漏洞。

如何使用Python通过HTTP读取远程Zip存档中的选定文件？

问题描述

4 个解决方案

解决方案1
8 已采纳 2008-09-18 17:03:42

解决方案2
3 2013-01-22 14:43:27

解决方案3
3 2009-06-04 20:13:44

解决方案4
1 2008-09-18 17:07:38

如何使用Python通过HTTP读取远程Zip存档中的选定文件？

问题描述

4 个解决方案

解决方案1 8 已采纳 2008-09-18 17:03:42

解决方案2 3 2013-01-22 14:43:27

解决方案3 3 2009-06-04 20:13:44

解决方案4 1 2008-09-18 17:07:38

解决方案1
8 已采纳 2008-09-18 17:03:42

解决方案2
3 2013-01-22 14:43:27

解决方案3
3 2009-06-04 20:13:44

解决方案4
1 2008-09-18 17:07:38