[英]Open binary file in zip archive as ZipExtFile
I'm trying to access a binary stream (via a ZipExtFile object) from a data file contained in a Zip archive.我正在尝试从 Zip 存档中包含的数据文件访问二进制流(通过 ZipExtFile 对象)。 To incrementally read in a text file object from the archive, this would be fairly straightforward:要从存档中增量读取文本文件对象,这将非常简单:
with ziparchive as ZipFile("myziparchive.zip", 'r'):
with txtfile as ziparchive.open("mybigtextfile.txt", 'r'):
for line in txtfile:
....
Ideally the byte stream equivalent would be something like:理想情况下,等效的字节流类似于:
with ziparchive as ZipFile("myziparchive.zip", 'r'):
with binfile as ziparchive.open("mybigbinary.bin", 'rb'):
while notEOF
binchunk = binfile.read(MYCHUNKSIZE)
....
Unfortunately, ZipFile.open
doesn't seem to support reading binary data to a ZipExtFile object.不幸的是, ZipFile.open
似乎不支持将二进制数据读取到 ZipExtFile 对象。 From the docs :从文档:
The mode parameter, if included, must be one of the following: 'r' (the default), 'U', or 'rU'.模式参数(如果包含)必须是以下之一:“r”(默认值)、“U”或“rU”。
Given this constraint, how best to incrementally read in the binary file directly from the archive?鉴于此约束,如何最好地直接从存档中增量读取二进制文件? Since the uncompressed file is quite large I'd like to avoid extracting it first.由于未压缩的文件非常大,我想避免先提取它。
I managed to solve the issue that I described in my comment to the OP.我设法解决了我在对 OP 的评论中描述的问题。 I have adapted it here, for your purpose, but I think that there is probably a way to just change the encoding of chunk_str, to avoid using ByteIO.为了您的目的,我在这里对其进行了调整,但我认为可能有一种方法可以只更改 chunk_str 的编码,以避免使用 ByteIO。
Anyway - here's my code if it helps:无论如何-如果有帮助,这是我的代码:
from io import BytesIO
from zipfile import ZipFile
MYCHUNKSIZE = 10
archive_file = r"test_resources\0000232514_bom.zip"
src_file = r"0000232514_bom.xls"
no_of_chunks_to_read = 10
with ZipFile(archive_file,'r') as zf:
with zf.open(src_file) as src_f:
while no_of_chunks_to_read > 0:
chunk_str = src_f.read(MYCHUNKSIZE)
chunk_stream = BytesIO(chunk_str)
chunk_bytes = chunk_stream.read()
print type(chunk_bytes), len(chunk_bytes), chunk_bytes
if len(chunk_str) < MYCHUNKSIZE:
# End of file
break
no_of_chunks_to_read -= 1
For line by line reading:逐行阅读:
with ZipFile("myziparchive.zip", 'r') as ziparchive:
with ziparchive.open("mybigtextfile.txt", 'r') as binfile:
for line in binfile:
line = line.decode() # bytes to str
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.