简体   繁体   中英

Decompress streaming BZ2 from memory in Python

I have a block of CSV data bz2 compressed data in memory

compressed = load_from_network_service(...)

I would like to iterate over a stream of decompressed lines.

for line in bz2_decompress_stream(compressed):
    ...

Does such a function exist?

In principle I could write to disk then read in using bz2.BZ2File , which only seems to want to consume a filename

with open('tmp', 'w') as f: 
    f.write(compressed)
with bz2.BZ2File('tmp') as f:
    for line in f:
        ...

But, for my current application disk I/O is a premium, so this is a pain.

Presumably the bz2.BZ2Decompressor object might be helpful here. My experience with it is that I give it my compressed data and it gives me the entire decompressed result; it doesn't seem to stream. Perhaps this is a limitation of my data?

There are two distinct problems:

  1. streaming
  2. not writing to disk

In order to solve 2., you are right that you can use bz2.BZ2Compressor . But a solution to 1.... entirely depends on what exactly your first line

compressed = load_from_network_service(...)

really returns. If compressed is a string, then there is not much you can do: you have to wait until you have retrieved it all, and then decompress. Instead, if for instance it is an incrementally "filled" StringIO , then you can do something like (untested):

decompressed = ''
while True:
    compressed_chunk = compressed.read(100)
    # Can be empty (even before the stream is exhausted):
    decompressed_chunk = decompressor.decompress(data)
    if decompressed_chunk:
        decompressed += decompressed_chunk
        new_lines = decompressed.splitlines()
        decompressed = new_lines[-1]
        for line in new_lines[:-1]:
            do_something(line)
    if len(chunk) < 100:
        # Reached EOF
        break

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM