简体   繁体   中英

PIL: image from url, cannot identify image file

I am trying to access an image from a url:

http://www.lifeasastrawberry.com/wp-content/uploads/2013/04/IMG_1191-1024x682.jpg

However, it fails with IOError("cannot identify image file") in the last step. Not sure what is going on or how to fix it. It has worked with many other url images.

    opener = urllib2.build_opener()
    opener.addheaders = [('User-agent', 'Mozilla/5.0')]
    opener.addheaders = [('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8')]
    opener.addheaders = [('Accept-Encoding', 'gzip,deflate,sdch')]

    response = opener.open(image_url,None,5)
    img_file = cStringIO.StringIO(response.read())  

    image = Image.open(img_file)

this url also fails:

http://www.canadianliving.com/img/photos/biz/Greek-Yogurt-Ceaser-Salad-Dressi1365783448.jpg

The problem is that you're telling your URL retriever to ask for a gzip-encoded result from the server, so the image data that you receive is gzip-encoded. You can solve this by either leaving off the accept-encoding header from your request, or by decompressing the gzip-encoded result manually :

from PIL import Image
import urllib2
import gzip
import cStringIO

opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
opener.addheaders = [('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8')]
opener.addheaders = [('Accept-Encoding', 'gzip,deflate,sdch')]

gzipped_file = cStringIO.StringIO(opener.open(url, None, 5).read())
image = Image.open(gzip.GzipFile(fileobj=gzipped_file))

The problem with this approach is that if you accept multiple encodings in your HTTP request, then you'll need to look in the HTTP headers of the result to see which encoding you actually got, and then decode manually based on whatever that value indicates.

I think it's easier to set the accept-encoding header to a value such that you will only accept one encoding (eg, 'identity;q=1, *;q=0' or something like that), or go ahead and start using the requests package to do HTTP.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM