简体   繁体   中英

How can I create a GzipFile instance from the “file-like object” that urllib.urlopen() returns?

I'm playing around with the Stack Overflow API using Python. I'm trying to decode the gzipped responses that the API gives.

import urllib, gzip

url = urllib.urlopen('http://api.stackoverflow.com/1.0/badges/name')
gzip.GzipFile(fileobj=url).read()

According to the urllib2 documentation , urlopen “returns a file-like object”.

However, when I run read() on the GzipFile object I've created using it, I get this error:

AttributeError: addinfourl instance has no attribute 'tell'

As far as I can tell, this is coming from the object returned by urlopen .

It doesn't appear to have seek either, as I get an error when I do this:

url.read()
url.seek(0)

What exactly is this object, and how do I create a functioning GzipFile instance from it?

The urlopen docs list the supported methods of the object that is returned. I recommend wrapping the object in another class that supports the methods that gzip expects.

Other option: call the read method of the response object and put the result in a StringIO object (which should support all methods that gzip expects). This maybe a little more expensive though.

Eg

import gzip
import json
import StringIO
import urllib

url = urllib.urlopen('http://api.stackoverflow.com/1.0/badges/name')
url_f = StringIO.StringIO(url.read())
g = gzip.GzipFile(fileobj=url_f)
j = json.load(g)
import urllib2
import json
import gzip
import io

url='http://api.stackoverflow.com/1.0/badges/name'
page=urllib2.urlopen(url)
gzip_filehandle=gzip.GzipFile(fileobj=io.BytesIO(page.read()))
json_data=json.loads(gzip_filehandle.read())
print(json_data)

io.BytesIO is for Python2.6+. For older versions of Python, you could use cStringIO.StringIO .

Here is a new update for @stefanw's answer, to whom that might think it too expensive to use that much memory.

Thanks to this article( https://www.enricozini.org/blog/2011/cazzeggio/python-gzip/ , it explains why gzip doesn't work), the solution is to use Python3.

import urllib.request
import gzip

response = urllib.request.urlopen('http://api.stackoverflow.com/1.0/badges/name')
with gzip.GzipFile(fileobj=response) as f:
    for line in f:
        print(line)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM