简体   繁体   中英

imghdr / python - Can't detec type of some images (image extension)

I'm downloading a lot of images from imgur.com with a Python script and since I have all the links in the format http://imgur.com/{id} I have to force download them by replacing the original url with http://i.imgur.com/{id}.gif , then saving all the images without extension. (I know that there is an Imgur's API but I can't use it since it have limitations for this kind of job)

Now after downoading images, I want to use imghdr module to determine the original extension of the image:

>>> import imghdr
>>> imghdr.what('/images/GrEdc')
'gif'

The problem is that this works with a success rate of 80%, the remaining 20% are all identified as 'None' and checking some of them I noticed that they are most likely all .jpg images.

Why imghdr can't detect the format? I'm able to open theese images with Ubuntu's default image viewer even without extension, so I don't think they are corrupted.

Note that in 2019, this bug has not been fixed. The solution is available at the link from Paul R.

A way to overcome the problem is to monkeypatch the problem:

# Monkeypatch bug in imagehdr
from imghdr import tests

def test_jpeg1(h, f):
    """JPEG data in JFIF format"""
    if b'JFIF' in h[:23]:
        return 'jpeg'


JPEG_MARK = b'\xff\xd8\xff\xdb\x00C\x00\x08\x06\x06' \
            b'\x07\x06\x05\x08\x07\x07\x07\t\t\x08\n\x0c\x14\r\x0c\x0b\x0b\x0c\x19\x12\x13\x0f'

def test_jpeg2(h, f):
    """JPEG with small header"""
    if len(h) >= 32 and 67 == h[5] and h[:32] == JPEG_MARK:
        return 'jpeg'


def test_jpeg3(h, f):
    """JPEG data in JFIF or Exif format"""
    if h[6:10] in (b'JFIF', b'Exif') or h[:2] == b'\xff\xd8':
        return 'jpeg'

tests.append(test_jpeg1)
tests.append(test_jpeg2)
tests.append(test_jpeg3)

That is a know problem in the lib, it don't detect fine some valid JPEG images.

You can use a modification of the lib that detect better all the JPEG images, specially in your case that you know for sure that all the files are images.

https://bugs.python.org/issue28591

If even with this fixed lib you fail to detect some images then you can try with pillow that support a more large number of formats but is less lightweight and is a external dependencies not included in the python build-in libs.

I had the problem when creating a mail attachment via MIMEImage class, and the error manifested (as googlefood):

  File "/usr/lib/python2.7/email/mime/image.py", line 43, in __init__
    raise TypeError('Could not guess image MIME subtype')
TypeError: Could not guess image MIME subtype

The reason is that MIMEImage internally relies on the ( buggy ) imghdr.what.

    if _subtype is None:
        _subtype = imghdr.what(None, _imagedata)
    if _subtype is None:
        raise TypeError('Could not guess image MIME subtype')

I could circumvent the problem by using guess_type:

from email.mime.image import MIMEImage
from mimetypes import guess_type
(mimetype, encoding) = guess_type(image)
(maintype, subtype) = mimetype.split('/');
fp = open(os.path.join(dirpath, image), 'rb')
mimeimage = MIMEImage(fp.read(), **{'_subtype': subtype})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM