简体   繁体   中英

Python: How to read images from zip file in memory?

I have seen variations of this question, but not in this exact context. What I have is a file called 100-Test.zip which contains 100 .jpg images. I want to open this file in memory and process each file doing PIL operations. The rest of the code is already written, I just want to concentrate on getting from the zip file to the first PIL image. This is what the code looks like now from suggestions I've gathered from reading other questions, but it's not working. Can you guys take a look and help?

import zipfile
from StringIO import StringIO
from PIL import Image

imgzip = open('100-Test.zip', 'rb')
z = zipfile.ZipFile(imgzip)
data = z.read(z.namelist()[0])
dataEnc = StringIO(data)
img = Image.open(dataEnc)

print img

But I am getting this error when I run it:

 IOError: cannot identify image file <StringIO.StringIO instance at
 0x7f606ecffab8>

Alternatives: I have seen other sources saying to use this instead:

image_file = StringIO(open("test.jpg",'rb').read())
im = Image.open(image_file)

But the problem is I'm not opening a file, it's already in memory inside the data variable. I also tried using dataEnc = StringIO.read(data) but got this error:

TypeError: unbound method read() must be called with StringIO instance as 
first argument (got str instance instead)

Turns out the problem was there was an extra empty element in namelist() due to the images being zipped inside a direcotory insde the zip file. Here is the full code that will check for that and iterate through the 100 images.

import zipfile
from StringIO import StringIO
from PIL import Image
import imghdr

imgzip = open('100-Test.zip')
zippedImgs = zipfile.ZipFile(imgzip)

for i in xrange(len(zippedImgs.namelist())):
    print "iter", i, " ",
    file_in_zip = zippedImgs.namelist()[i]
    if (".jpg" in file_in_zip or ".JPG" in file_in_zip):
        print "Found image: ", file_in_zip, " -- ",
        data = zippedImgs.read(file_in_zip)
        dataEnc = StringIO(data)
        img = Image.open(dataEnc)
        print img
    else:
        print ""

Thanks guys!

There is no need to use StringIO. zipfile can read image file in memory. The following loops through all images in your .zip file:

import zipfile
from PIL import Image

imgzip = zipfile.ZipFile("100-Test.zip")
inflist = imgzip.infolist()

for f in inflist:
    ifile = imgzip.open(f)
    img = Image.open(ifile)
    print(img)
    # display(img)

I have the same issue, thanks for @alfredox, I modified the answer, use io.BytesIO not StringIo in python3.

z = zipfile.ZipFile(zip_file)
for i in range(len(z.namelist())):

    file_in_zip = z.namelist()[i]
    if (".jpg" in file_in_zip or ".JPG" in file_in_zip):

        data = z.read(file_in_zip)
        dataEnc = io.BytesIO(data)
        img = Image.open(dataEnc)
        print(img)

If you need to work on pixel data then you can load an image stream data from zip file as numpy array keeping the original data shape (ie 32x32 RGB) following the steps:

  1. use zipfile to get the ZipExtFile format
  2. use PIL.Image to convert ZipExtFile into image like data structure
  3. convert PIL.image into numpy array

No need to reshape numpy array with original data shape because PIL.Image already has the information. So the output will be a numpy array with shape=(32,32,3)

import numpy as np
import zipfile
from PIL import Image

with zipfile.ZipFile(zip_data_path, "r") as zip_data:
    content_list = zip_data.namelist()
    for name_file in content_list:
        img_bytes = zip_data.open(name_file)          # 1
        img_data = Image.open(img_bytes)              # 2
        # ndarray with shape=(32,32,3)
        image_as_array = np.array(img_data, np.uint8) # 3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM