简体   繁体   中英

using aws lambda as serverless input-sanitization proxy to s3

I'm looking to directly upload images from the client using presigned urls (S3), and I keep hitting these two barriers:

  1. Potentially malicious files and the need to sanitize input (in my case, images).
  2. Potentially malicious uploading of too many files at once.

Obviously this cannot be done on the client, as the weakness will still be exposed. monitoring file extensions, which to my knowledge can be accomplished using aws s3 bucket policies, isn't a real solution to this problem - realistically I would be looking for file sanitizing SDK's (for this project I'm using node, so accomplishing this server-side would be quite simple).
Can AWS Lambda supply this type of functionality? For this use case, would it still make sense to use Lambda at all? It seems to me that piping images to s3 through lambda to "save" on server-side piping is a little silly, considering a double upload is still required.

if you generate the presigned url as per need and keep that alive as per your use case. So in that case the presigned url is live when you need it and different for all the clients.

I ended up implementing a direct-to-s3 upload from the client using presigned POST data generated on demand on a dedicated api server and sent to the client. Not to be confused with a presigned url that files can be uploaded to, the presigned data is used to compile an html form, whereupon submition of said form will upload the files.

Using server-generated PUT urls produced by the node aws-sdk can accomplish a few goals, namely:

  • The urls are "presigned" and therefore no further authentication is needed, thus credentials never need to be exposed on the client.

  • It is possible to limit upload to a certain key, and limit the upload time by defining an expiration time for the url. Needless to say, the url is initially limited to only the actions that the credentials used to sign the url are limited to.

Presigned POST data, on the other hand, requires more work but it allows the enforcement of another critical limitation for security reasons: upload size.


The other half of my problem was concerned with input sanitization - routing the files to s3 through my server (the standard way) would allow me to sanitize the input in any way I see fit. Client side upload on the other hand cannot prevent malicious users from uploading any file they want, including malicious files masquerading as images.

My first attempt at "catching" the files before they reach s3 - an aws lambda function would recieve the photos before s3 and act as a sort of proxy, sanitising and sending only clean data - didn't pan out as there is a strict cap on data transfers to a lambda using an api gateway - theoretically 6mb, although in my exprience less than that. Also, that effectively means double the upload time.

My second solution ended up working for me - I wrote a lambda function to be triggered by every POST upload to the relevant bucket, and implemented two constraints:

  1. Keeping each user folder under X mbs (by using the ListObject method provided by the aws-sdk and deleting all older photos that don't make the X mb cut).

  2. Sanitizing every uploaded file using the node 'mmmagic' package, and deleting files that aren't images (checking file extension is shallow and can't securely differentiate between images and non-images). At first I planned to also delete all exif data as well (as exif data can contain malicious code) but I ended up not doing that because it seemed excessive for a POC.

Using these features (expiry, credentials on the server, limit on upload size, limit on folder size, limit to strictly uploading, deleting non-images) I believe a fairly secure (although not hermetically so) and very efficient upload can take place.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM