简体繁体 English

使用 aws lambda 作为 s3 的无服务器输入清理代理

[英]using aws lambda as serverless input-sanitization proxy to s3

原文 2020-01-26 10:49:45 3 2 javascript/ amazon-web-services/ validation/ amazon-s3/ aws-lambda

I'm looking to directly upload images from the client using presigned urls (S3), and I keep hitting these two barriers:我希望使用预先签名的 url (S3) 直接从客户端上传图像，并且我一直遇到这两个障碍：

Potentially malicious files and the need to sanitize input (in my case, images).潜在的恶意文件和清理输入的需要（在我的例子中，图像）。
Potentially malicious uploading of too many files at once.一次潜在的恶意上传太多文件。

Obviously this cannot be done on the client, as the weakness will still be exposed.显然这不能在客户端上完成，因为弱点仍然会暴露出来。 monitoring file extensions, which to my knowledge can be accomplished using aws s3 bucket policies, isn't a real solution to this problem - realistically I would be looking for file sanitizing SDK's (for this project I'm using node, so accomplishing this server-side would be quite simple).监控文件扩展名，据我所知可以使用 aws s3 存储桶策略来完成，并不是这个问题的真正解决方案 - 实际上我会寻找文件清理 SDK（对于这个项目我正在使用节点，所以完成这个服务器-side 会很简单）。
Can AWS Lambda supply this type of functionality? AWS Lambda 能否提供此类功能？ For this use case, would it still make sense to use Lambda at all?对于这个用例，使用 Lambda 仍然有意义吗？ It seems to me that piping images to s3 through lambda to "save" on server-side piping is a little silly, considering a double upload is still required.在我看来，通过 lambda 将图像传输到 s3 以在服务器端管道上“保存”有点愚蠢，考虑到仍然需要双重上传。

2 个解决方案

if you generate the presigned url as per need and keep that alive as per your use case.如果您根据需要生成预先签名的 url 并根据您的用例保持它的活动。 So in that case the presigned url is live when you need it and different for all the clients.因此，在这种情况下，预先签名的 url 会在您需要时生效，并且对于所有客户端都不同。

I ended up implementing a direct-to-s3 upload from the client using presigned POST data generated on demand on a dedicated api server and sent to the client.我最终使用在专用 api 服务器上按需生成并发送到客户端的预签名 POST 数据从客户端实现了直接到 s3 的上传。 Not to be confused with a presigned url that files can be uploaded to, the presigned data is used to compile an html form, whereupon submition of said form will upload the files.不要与可以上传文件的预签名url混淆，预签名数据用于编译 html 表单，然后提交所述表单将上传文件。

Using server-generated PUT urls produced by the node aws-sdk can accomplish a few goals, namely:使用由节点 aws-sdk 生成的服务器生成的 PUT url 可以实现几个目标，即：

The urls are "presigned" and therefore no further authentication is needed, thus credentials never need to be exposed on the client. url 是“预先签名的”，因此不需要进一步的身份验证，因此永远不需要在客户端上公开凭据。
It is possible to limit upload to a certain key, and limit the upload time by defining an expiration time for the url.可以限制上传到某个key，通过为url定义一个过期时间来限制上传时间。 Needless to say, the url is initially limited to only the actions that the credentials used to sign the url are limited to.不用说，该 url 最初仅限于用于对 url 进行签名的凭据所限制的操作。

Presigned POST data, on the other hand, requires more work but it allows the enforcement of another critical limitation for security reasons: upload size.另一方面，预先签名的 POST 数据需要更多的工作，但它允许出于安全原因执行另一个关键限制：上传大小。

The other half of my problem was concerned with input sanitization - routing the files to s3 through my server (the standard way) would allow me to sanitize the input in any way I see fit.我的问题的另一半与输入清理有关 - 通过我的服务器将文件路由到 s3（标准方式）将允许我以我认为合适的任何方式清理输入。 Client side upload on the other hand cannot prevent malicious users from uploading any file they want, including malicious files masquerading as images.另一方面，客户端上传不能阻止恶意用户上传他们想要的任何文件，包括伪装成图像的恶意文件。

My first attempt at "catching" the files before they reach s3 - an aws lambda function would recieve the photos before s3 and act as a sort of proxy, sanitising and sending only clean data - didn't pan out as there is a strict cap on data transfers to a lambda using an api gateway - theoretically 6mb, although in my exprience less than that.我第一次尝试在文件到达 s3 之前“捕获”它们 - aws lambda 函数会在 s3 之前接收照片并充当一种代理，清理并仅发送干净的数据 - 没有成功，因为有严格的上限使用 api 网关将数据传输到 lambda - 理论上是 6mb，虽然在我的经验中比这少。 Also, that effectively means double the upload time.此外，这实际上意味着上传时间加倍。

My second solution ended up working for me - I wrote a lambda function to be triggered by every POST upload to the relevant bucket, and implemented two constraints:我的第二个解决方案最终对我有用 - 我编写了一个 lambda 函数，由每次 POST 上传到相关存储桶时触发，并实现了两个约束：

Keeping each user folder under X mbs (by using the ListObject method provided by the aws-sdk and deleting all older photos that don't make the X mb cut).将每个用户文件夹保留在 X mbs 下（通过使用 aws-sdk 提供的 ListObject 方法并删除所有未进行 X mb 剪切的旧照片）。
Sanitizing every uploaded file using the node 'mmmagic' package, and deleting files that aren't images (checking file extension is shallow and can't securely differentiate between images and non-images).使用节点“mmmagic”包清理每个上传的文件，并删除不是图像的文件（检查文件扩展名很浅，无法安全地区分图像和非图像）。 At first I planned to also delete all exif data as well (as exif data can contain malicious code) but I ended up not doing that because it seemed excessive for a POC.起初我还计划删除所有的 exif 数据（因为 exif 数据可能包含恶意代码），但我最终没有这样做，因为这对于 POC 来说似乎太过分了。

Using these features (expiry, credentials on the server, limit on upload size, limit on folder size, limit to strictly uploading, deleting non-images) I believe a fairly secure (although not hermetically so) and very efficient upload can take place.使用这些功能（到期、服务器上的凭据、上传大小限制、文件夹大小限制、严格上传限制、删除非图像）我相信可以进行相当安全（尽管不是完全密封）和非常高效的上传。