简体   繁体   中英

Python bz2 sequential compressor produces invalid data stream on low compression levels

I have a series of strings in a list named 'lines' and I compress them as follows:

import bz2
compressor = bz2.BZ2Compressor(compressionLevel)
for l in lines:
    compressor.compress(l)
compressedData = compressor.flush()
decompressedData = bz2.decompress(compressedData)

When compressionLevel is set to 8 or 9, this works fine. When it's any number between 1 and 7 (inclusive), the final line fails with an IOError: invalid data stream. The same occurs if I use the sequential decompressor. However, if I join the strings into one long string and use the one-shot compressor function, it works fine:

import bz2
compressedData = bz2.compress("\n".join(lines))
decompressedData = bz2.decompress(compressedData)
# Works perfectly

Do you know why this would be and how to make it work at lower compression levels?

You are throwing away the compressed data returned by compressor.compress(l) ... docs say "Returns a chunk of compressed data if possible, or an empty byte string otherwise." You need to do something like this:

# setup code goes here
for l in lines:
    chunk = compressor.compress(l)
    if chunk: do_something_with(chunk)
chunk = compressor.flush()
if chunk: do_something_with(chunk)
# teardown code goes here

Also note that your oneshot code uses "\\n".join() ... to check this against the chunked result, use "".join()

Also beware of bytes/str issues eg the above should be b"whatever".join() .

What version of Python are you using?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM