简体   繁体   中英

How can I use AWS Cognito to restrict access to S3 files?

I am creating a web application that needs to serve large files from an S3 bucket to users for download. User in our application are authorized by Cognito. I would like to have an S3 bucket with files such that certain Cognito users are allowed to download only certain files. From my research I have found a few ways of doing this. None of which seem to perfectly fit my use case as far as I can tell.

AWS allows for S3 buckets to be permissioned by Cognito user. This is so close to what I need but actually does not seem usable sadly. In the security scheme of our application, Cognito logins belong to an organization. Each organization shares all of its data on the backend. So I'd need to allow all logins in an organization defined by my database access to the S3 bucket, not user logins by name.

Presigned URLs seem like a typical use case for this, in a way, but still not really what I need, I think. Presigned URLs will give me an expiring URL for the user to download a file. Fine so I can give each user a URL that's maybe custom to the user but I can issue several in my backend. But I don't really want the URL to expire, I'd like it to last forever. This is not a huge deal because the permanent URL could be an API endpoint that redirects to a dynamically created presigned URL that expires in one minute maybe. But that URL is going to allow access to anyone who has it. That's not fitting with our security model restricting the URL to Cognito users by login. If that URL gets out, by user cutting and pasting or maybe a packet sniffer, it seems the security breaks down. Sure the URL expires but it doesn't seem exactly right for this project.

Another option I've considered is implementing this in code myself by making an API endpoint that creates a downloadable file stream which is created by accessing the S3 object also as a file stream. It would read the file into memory and stream it to the user. Security for this seems to match our needs perfectly, since that API endpoint would of course validate the Cognito user's auth token. But reading from the S3 bucket into my backend and then off to the user is needless network traffic, could be slower, and also requires potentially a lot of memory in the backend process.

It seems like cutting out the middleman and allowing the user direct access to the S3 bucket, albeit with the correct user permission restrictions, would be the best solution. I just can find any project or tutorial that does this in a best practices recommended way that fits my project. I think my project has a pretty common use case. Is there a better way to do this?

One approach I have seen apps take to this problem is to just use a fully public bucket with unguessable S3 object names. For instance you would store the users file in s3://public-bucket/<secure hash> , and then in the response redirect them to the public URL for that object, so that now the user is directly downloading from S3. Because S3 does not allow you to list objects, this is theoretically secure as the random name of the object sort of acts like a password, that one would need to know exactly to access the file. And because all traffic to S3 is over SSL, the URL can never be exposed in transit.

Now personally this feels a little icky to me, because unlike a password the name of the file is visible in the browser history and probably other places, but it's probably not the worst thing in the world if the data is not very sensitive..

One thing you mention that I disagree with:

Another option I've considered is implementing this in code myself by making an API endpoint that creates a downloadable file stream which is created by accessing the S3 object also as a file stream. ... But reading from the S3 bucket into my backend and then off to the user is needless network traffic, could be slower, and also requires potentially a lot of memory in the backend process

This is probably the best solution in my opinion (to have a webserver as an intermediary) because you have full control over your application logic and can sleep soundly knowing there's less chance for shenanigans and data exposure.

I'm very skeptical that it would have much overhead computationally. Streaming through data from S3 should be fast and use very little memory (you can probably do it on a t2.micro and be fine unless you have a ton of requests). All web frameworks should allow you to stream data in an HTTP response, so you don't need to slurp into memory at all. I've built similar things and its never been a performance bottleneck for me anyway.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM