在 zip 存档中打开二进制文件作为 ZipExtFile

Question

I'm trying to access a binary stream (via a ZipExtFile object) from a data file contained in a Zip archive.我正在尝试从 Zip 存档中包含的数据文件访问二进制流（通过 ZipExtFile 对象）。 To incrementally read in a text file object from the archive, this would be fairly straightforward:要从存档中增量读取文本文件对象，这将非常简单：

with ziparchive as ZipFile("myziparchive.zip", 'r'):
    with txtfile as ziparchive.open("mybigtextfile.txt", 'r'):
       for line in txtfile:
           ....

Ideally the byte stream equivalent would be something like:理想情况下，等效的字节流类似于：

with ziparchive as ZipFile("myziparchive.zip", 'r'):
    with binfile as ziparchive.open("mybigbinary.bin", 'rb'):
        while notEOF
            binchunk = binfile.read(MYCHUNKSIZE)
            ....

Unfortunately, ZipFile.open doesn't seem to support reading binary data to a ZipExtFile object.不幸的是， ZipFile.open似乎不支持将二进制数据读取到 ZipExtFile 对象。 From the docs :从文档：

The mode parameter, if included, must be one of the following: 'r' (the default), 'U', or 'rU'.模式参数（如果包含）必须是以下之一：“r”（默认值）、“U”或“rU”。

Given this constraint, how best to incrementally read in the binary file directly from the archive?鉴于此约束，如何最好地直接从存档中增量读取二进制文件？ Since the uncompressed file is quite large I'd like to avoid extracting it first.由于未压缩的文件非常大，我想避免先提取它。

Answer 1

I managed to solve the issue that I described in my comment to the OP.我设法解决了我在对 OP 的评论中描述的问题。 I have adapted it here, for your purpose, but I think that there is probably a way to just change the encoding of chunk_str, to avoid using ByteIO.为了您的目的，我在这里对其进行了调整，但我认为可能有一种方法可以只更改 chunk_str 的编码，以避免使用 ByteIO。

Anyway - here's my code if it helps:无论如何-如果有帮助，这是我的代码：

from io import BytesIO
from zipfile import ZipFile

MYCHUNKSIZE = 10

archive_file = r"test_resources\0000232514_bom.zip"
src_file = r"0000232514_bom.xls"

no_of_chunks_to_read = 10
with ZipFile(archive_file,'r') as zf:
    with zf.open(src_file) as src_f:
        while no_of_chunks_to_read > 0:
            chunk_str = src_f.read(MYCHUNKSIZE)
            chunk_stream = BytesIO(chunk_str)
            chunk_bytes = chunk_stream.read()
            print type(chunk_bytes), len(chunk_bytes), chunk_bytes
            if len(chunk_str) < MYCHUNKSIZE:
                # End of file
                break
            no_of_chunks_to_read -= 1

Answer 2

For line by line reading:逐行阅读：

with ZipFile("myziparchive.zip", 'r') as ziparchive:
    with ziparchive.open("mybigtextfile.txt", 'r') as binfile:
       for line in binfile:
           line = line.decode()  # bytes to str
           ...

在 zip 存档中打开二进制文件作为 ZipExtFile

问题描述

2 个解决方案

解决方案1
0 2018-01-11 07:43:06

解决方案2
0 2021-04-19 18:50:57

在 zip 存档中打开二进制文件作为 ZipExtFile

问题描述

2 个解决方案

解决方案1 0 2018-01-11 07:43:06

解决方案2 0 2021-04-19 18:50:57

解决方案1
0 2018-01-11 07:43:06

解决方案2
0 2021-04-19 18:50:57