简体   繁体   中英

best/cheapest way to scan and modify large number of files on S3

We set up an image 'ThumbNailer' on our amazon S3 instance which we plan on using to store customer files. The idea was to create 150x150 pixel thumbnails whenever a file-create event was triggered for a new file that was of type jpg/png/gif. It works fine, but we then used an Amazon snowball to transfer about 8Tb of images into S3. I enabled the thumbnailer trigger on the bucket that is to be used for incoming customer files before the snowball import, but the lambda script was running faster than the garbage cleaning for the tmp space in the lambda environment and the result was that lambda ran out of tmp space and only the first few hundred (of tens of thousands) of images ran through the thumbnailer script properly.

I thought this might end up being the case on the import, but now I need to go back through those files to generate thumbnails for the images and to store the original image resolution (width and height) as meta tags in the original image files.

I'm not sure what inside the AWS cloud space incurs 'transfer' fees and what does not, nor am I sure what the best method would be to generate these thumbnails and read the image resolutions. I'm aware there are 'tricks' to read the first few hundred bytes of a file rather than transferring the whole thing (ie to get the image resolution from the file headers) and I also have an EC2 instance set up with an S3fs fuse connection to the respective buckets.

What is going to be the easiest and cheapest way to generate my thumbnails and store the metadata for this large number of images? I don't want to run a script across the EC2 filesystem only to find out it generates a couple-hundred dollars in transfer fees!

SW

A quote from Amazon S3 Pricing :

Transfers between S3 buckets or from Amazon S3 to any service(s) within the same AWS Region are free.

Since you already have the lambda I'd consider running your thumbnail job as Amazon S3 Batch operation.

I'd also run it for a small portion first know the exact costs before launching the job on a gazillion of files.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM