简体   繁体   中英

Open binary file in zip archive as ZipExtFile

I'm trying to access a binary stream (via a ZipExtFile object) from a data file contained in a Zip archive. To incrementally read in a text file object from the archive, this would be fairly straightforward:

with ziparchive as ZipFile("myziparchive.zip", 'r'):
    with txtfile as ziparchive.open("mybigtextfile.txt", 'r'):
       for line in txtfile:
           ....

Ideally the byte stream equivalent would be something like:

with ziparchive as ZipFile("myziparchive.zip", 'r'):
    with binfile as ziparchive.open("mybigbinary.bin", 'rb'):
        while notEOF
            binchunk = binfile.read(MYCHUNKSIZE)
            ....

Unfortunately, ZipFile.open doesn't seem to support reading binary data to a ZipExtFile object. From the docs :

The mode parameter, if included, must be one of the following: 'r' (the default), 'U', or 'rU'.

Given this constraint, how best to incrementally read in the binary file directly from the archive? Since the uncompressed file is quite large I'd like to avoid extracting it first.

I managed to solve the issue that I described in my comment to the OP. I have adapted it here, for your purpose, but I think that there is probably a way to just change the encoding of chunk_str, to avoid using ByteIO.

Anyway - here's my code if it helps:

from io import BytesIO
from zipfile import ZipFile

MYCHUNKSIZE = 10

archive_file = r"test_resources\0000232514_bom.zip"
src_file = r"0000232514_bom.xls"

no_of_chunks_to_read = 10
with ZipFile(archive_file,'r') as zf:
    with zf.open(src_file) as src_f:
        while no_of_chunks_to_read > 0:
            chunk_str = src_f.read(MYCHUNKSIZE)
            chunk_stream = BytesIO(chunk_str)
            chunk_bytes = chunk_stream.read()
            print type(chunk_bytes), len(chunk_bytes), chunk_bytes
            if len(chunk_str) < MYCHUNKSIZE:
                # End of file
                break
            no_of_chunks_to_read -= 1

For line by line reading:

with ZipFile("myziparchive.zip", 'r') as ziparchive:
    with ziparchive.open("mybigtextfile.txt", 'r') as binfile:
       for line in binfile:
           line = line.decode()  # bytes to str
           ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM