简体   繁体   中英

How to download images from s3 as numpy arrays?

I am trying to train a neural network where I pass in a series of images. I want to create a generator which passes each image in as a numpy array

from skimage import io
image_array = io.imread(url)

I want something like this:

s3 = boto3.resource('s3')
my_bucket = s3.Bucket('some-bucket')
def my_generator():
    for object in my_bucket.objects.all():
        image_array = io.imread(object)    # this will not work. object is of type s3.ObjectSummary(bucket_name='manga-learn-data', key=u'one-piece-colored-5340113_06_05.png') whereas io.imread is expecting a url or uri. 
        yield image_array

But that image_array variable will never work. Everything I can find about downloading images from amazon s3 indicates that you download the file to a file. I want to download it to an image object that I can open as an array.

Based on the docs for imread , it appears that it only supports passing in a filename or a URL. So no file-like objects. So it looks like you have two options: save to a temp file, or generate a presigned url to pass in. Here's an example of generating that pre-signed url:

import boto3

s3 = boto3.client('s3')
params = {'Bucket': 'foo', 'Key': 'img.jpg'}
url = s3.generate_presigned_url(ClientMethod='get_object', Params=params)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM