unpack bz2 url without temporary file in python

Question

I want to unpack data from bz2 url directly to target file. Here is the code:

filename = 'temp.file'  
req = urllib2.urlopen('http://example.com/file.bz2')
CHUNK = 16 * 1024
with open(filename, 'wb') as fp:
  while True:
    chunk = req.read(CHUNK)
    if not chunk: break
    fp.write(bz2.decompress(chunk)) 
fp.close()

Error on bz2.decompress(chunk) - ValueError: couldn't find end of stream

Answer 1

Use bz2.BZ2Decompressor to do sequential decompression:

filename = 'temp.file'  
req = urllib2.urlopen('http://example.com/file.bz2')
CHUNK = 16 * 1024

decompressor = bz2.BZ2Decompressor()
with open(filename, 'wb') as fp:
    while True:
        chunk = req.read(CHUNK)
        if not chunk:
            break
        fp.write(decompressor.decompress(chunk))
req.close()

BTW, you don't need to call fp.close() as long as you use with statement.

Answer 2

You should use BZ2Decompressor which supports incremental decompression. see https://docs.python.org/2/library/bz2.html#bz2.BZ2Decompressor

I haven't debugged this but it should work like this:

filename = 'temp.file'  
req = urllib2.urlopen('http://example.com/file.bz2')
CHUNK = 16 * 1024

decompressor = bz.BZ2Decompressor()

with open(filename, 'wb') as fp:
  while True:
    chunk = req.read(CHUNK)
    if not chunk: break

    decomp = decompressor.decompress(chunk)
    if decomp:
        fp.write(decomp)

Answer 3

Here's a more direct and efficient way using requests in streaming mode:

req = requests.get('http://example.com/file.bz2', stream=True)
with open(filename, 'wb') as fp:
    shutil.copyfileobj(req.raw, fp)

unpack bz2 url without temporary file in python

Question

3 answers

solution1
3 ACCPTED 2014-12-16 15:54:31

solution2
2 2014-12-16 15:51:09

solution3
0 2018-10-01 03:38:30

unpack bz2 url without temporary file in python

Question

3 answers

solution1 3 ACCPTED 2014-12-16 15:54:31

solution2 2 2014-12-16 15:51:09

solution3 0 2018-10-01 03:38:30

solution1
3 ACCPTED 2014-12-16 15:54:31

solution2
2 2014-12-16 15:51:09

solution3
0 2018-10-01 03:38:30