I have a block of CSV data bz2 compressed data in memory
compressed = load_from_network_service(...)
I would like to iterate over a stream of decompressed lines.
for line in bz2_decompress_stream(compressed):
...
Does such a function exist?
In principle I could write to disk then read in using bz2.BZ2File
, which only seems to want to consume a filename
with open('tmp', 'w') as f:
f.write(compressed)
with bz2.BZ2File('tmp') as f:
for line in f:
...
But, for my current application disk I/O is a premium, so this is a pain.
Presumably the bz2.BZ2Decompressor
object might be helpful here. My experience with it is that I give it my compressed data and it gives me the entire decompressed result; it doesn't seem to stream. Perhaps this is a limitation of my data?
There are two distinct problems:
In order to solve 2., you are right that you can use bz2.BZ2Compressor
. But a solution to 1.... entirely depends on what exactly your first line
compressed = load_from_network_service(...)
really returns. If compressed
is a string, then there is not much you can do: you have to wait until you have retrieved it all, and then decompress. Instead, if for instance it is an incrementally "filled" StringIO
, then you can do something like (untested):
decompressed = ''
while True:
compressed_chunk = compressed.read(100)
# Can be empty (even before the stream is exhausted):
decompressed_chunk = decompressor.decompress(data)
if decompressed_chunk:
decompressed += decompressed_chunk
new_lines = decompressed.splitlines()
decompressed = new_lines[-1]
for line in new_lines[:-1]:
do_something(line)
if len(chunk) < 100:
# Reached EOF
break
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.