I want to unpack data from bz2 url directly to target file. Here is the code:
filename = 'temp.file'
req = urllib2.urlopen('http://example.com/file.bz2')
CHUNK = 16 * 1024
with open(filename, 'wb') as fp:
while True:
chunk = req.read(CHUNK)
if not chunk: break
fp.write(bz2.decompress(chunk))
fp.close()
Error on bz2.decompress(chunk) - ValueError: couldn't find end of stream
Use bz2.BZ2Decompressor
to do sequential decompression:
filename = 'temp.file'
req = urllib2.urlopen('http://example.com/file.bz2')
CHUNK = 16 * 1024
decompressor = bz2.BZ2Decompressor()
with open(filename, 'wb') as fp:
while True:
chunk = req.read(CHUNK)
if not chunk:
break
fp.write(decompressor.decompress(chunk))
req.close()
BTW, you don't need to call fp.close()
as long as you use with
statement.
You should use BZ2Decompressor
which supports incremental decompression. see https://docs.python.org/2/library/bz2.html#bz2.BZ2Decompressor
I haven't debugged this but it should work like this:
filename = 'temp.file'
req = urllib2.urlopen('http://example.com/file.bz2')
CHUNK = 16 * 1024
decompressor = bz.BZ2Decompressor()
with open(filename, 'wb') as fp:
while True:
chunk = req.read(CHUNK)
if not chunk: break
decomp = decompressor.decompress(chunk)
if decomp:
fp.write(decomp)
Here's a more direct and efficient way using requests
in streaming mode:
req = requests.get('http://example.com/file.bz2', stream=True)
with open(filename, 'wb') as fp:
shutil.copyfileobj(req.raw, fp)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.