在 zip 存檔中打開二進制文件作為 ZipExtFile

Question

我正在嘗試從 Zip 存檔中包含的數據文件訪問二進制流（通過 ZipExtFile 對象）。 要從存檔中增量讀取文本文件對象，這將非常簡單：

with ziparchive as ZipFile("myziparchive.zip", 'r'):
    with txtfile as ziparchive.open("mybigtextfile.txt", 'r'):
       for line in txtfile:
           ....

理想情況下，等效的字節流類似於：

with ziparchive as ZipFile("myziparchive.zip", 'r'):
    with binfile as ziparchive.open("mybigbinary.bin", 'rb'):
        while notEOF
            binchunk = binfile.read(MYCHUNKSIZE)
            ....

不幸的是， ZipFile.open似乎不支持將二進制數據讀取到 ZipExtFile 對象。 從文檔：

模式參數（如果包含）必須是以下之一：“r”（默認值）、“U”或“rU”。

鑒於此約束，如何最好地直接從存檔中增量讀取二進制文件？ 由於未壓縮的文件非常大，我想避免先提取它。

Answer 1

我設法解決了我在對 OP 的評論中描述的問題。 為了您的目的，我在這里對其進行了調整，但我認為可能有一種方法可以只更改 chunk_str 的編碼，以避免使用 ByteIO。

無論如何-如果有幫助，這是我的代碼：

from io import BytesIO
from zipfile import ZipFile

MYCHUNKSIZE = 10

archive_file = r"test_resources\0000232514_bom.zip"
src_file = r"0000232514_bom.xls"

no_of_chunks_to_read = 10
with ZipFile(archive_file,'r') as zf:
    with zf.open(src_file) as src_f:
        while no_of_chunks_to_read > 0:
            chunk_str = src_f.read(MYCHUNKSIZE)
            chunk_stream = BytesIO(chunk_str)
            chunk_bytes = chunk_stream.read()
            print type(chunk_bytes), len(chunk_bytes), chunk_bytes
            if len(chunk_str) < MYCHUNKSIZE:
                # End of file
                break
            no_of_chunks_to_read -= 1

Answer 2

逐行閱讀：

with ZipFile("myziparchive.zip", 'r') as ziparchive:
    with ziparchive.open("mybigtextfile.txt", 'r') as binfile:
       for line in binfile:
           line = line.decode()  # bytes to str
           ...

在 zip 存檔中打開二進制文件作為 ZipExtFile

問題描述

2 個解決方案

解決方案1
0 2018-01-11 07:43:06

解決方案2
0 2021-04-19 18:50:57

在 zip 存檔中打開二進制文件作為 ZipExtFile

問題描述

2 個解決方案

解決方案1 0 2018-01-11 07:43:06

解決方案2 0 2021-04-19 18:50:57

解決方案1
0 2018-01-11 07:43:06

解決方案2
0 2021-04-19 18:50:57