简体   繁体   English

在express / nodejs app中提供存储在S3中的文件

[英]Serving files stored in S3 in express/nodejs app

I have app where user's photos are private. 我有应用程序用户的照片是私人的。 I store the photos(thumbnails also) in AWS s3. 我将照片(缩略图也)存储在AWS s3中。 There is a page in the site where user can view his photos(ie thumbnails). 网站中有一个页面,用户可以在其中查看他的照片(即缩略图)。 Now my problem is how do I serve these files. 现在我的问题是如何提供这些文件。 Some options that I have evaluated are: 我评估的一些选项是:

  • Serving files from CloudFront(or AWS) using signed url generation. 使用签名的URL生成从CloudFront(或AWS)提供文件。 But the problem is every time the user refreshes the page I have to create so many signed urls again and load it. 但问题是,每次用户刷新页面时,我必须再次创建这么多已签名的URL并加载它。 So therefore I wont be able to cache the Images in the browser which would have been a good choice. 因此,我无法在浏览器中缓存图像本来是一个不错的选择。 Is there anyway to do still in javascript? 无论如何仍然在javascript中做? I cant have the validity of those urls for longer due to security issues. 由于安全问题,我不能长时间保持这些网址的有效性。 And secondly within that time frame if someone got hold of that url he can view the file without running through authentication from the app. 其次,在该时间范围内,如果有人抓住该网址,他可以查看该文件,而无需通过应用程序进行身份验证。
  • Other option is to serve the file from my express app itself after streaming it from S3 servers. 其他选项是在从S3服务器流式传输后从我的快递应用程序本身提供文件。 This allows me to have http cache headers, therefore enable browser caching. 这允许我有http缓存头,因此启用浏览器缓存。 It also makes sure no one can view a file without being authenticated. 它还确保没有人可以在未经过身份验证的情况下查看文件。 Ideally I would like to stream the file and a I am hosting using NGINX proxy relay the other side streaming to NGINX. 理想情况下,我想流式传输文件和我使用NGINX代理中继托管另一端流式传输到NGINX。 But as i see that can only be possible if the file exist in the same system's files. 但正如我所见,只有文件存在于同一系统的文件中才能实现。 But here I have to stream it and return when i get the stream is complete. 但是在这里我必须流式传输并在我完成流时返回。 Don't want to store the files locally. 不想在本地存储文件。

I am not able to evaluate which of the two options would be a better choice?? 我无法评估这两个选项中哪一个是更好的选择? I want to redirect as much work as possible to S3 or cloudfront but even using singed urls also makes the request first to my servers. 我想尽可能多地将工作重定向到S3或cloudfront,但即使使用了singed url也会首先向我的服务器发出请求。 I also want caching features. 我也想要缓存功能。

So what would be ideal way to do? 那么理想的做法是什么? with the answers for the particular questions pertaining to those methods? 有关这些方法的特定问题的答案?

i would just stream it from S3. 我会从S3流式传输它。 it's very easy, and signed URLs are much more difficult. 它非常简单,签名的URL要困难得多。 just make sure you set the content-type and content-length headers when you upload the images to S3. 只需确保在将图像上传到S3时设置content-typecontent-length标题。

var aws = require('knox').createClient({
  key: '',
  secret: '',
  bucket: ''
})

app.get('/image/:id', function (req, res, next) {
  if (!req.user.is.authenticated) {
    var err = new Error()
    err.status = 403
    next(err)
    return
  }

  aws.get('/image/' + req.params.id)
  .on('error', next)
  .on('response', function (resp) {
    if (resp.statusCode !== 200) {
      var err = new Error()
      err.status = 404
      next(err)
      return
    }

    res.setHeader('Content-Length', resp.headers['content-length'])
    res.setHeader('Content-Type', resp.headers['content-type'])

    // cache-control?
    // etag?
    // last-modified?
    // expires?

    if (req.fresh) {
      res.statusCode = 304
      res.end()
      return
    }

    if (req.method === 'HEAD') {
      res.statusCode = 200
      res.end()
      return
    }

    resp.pipe(res)
  })
})

If you'll redirect user to a signed url using 302 Found browser will cache the resulting image according to its cache-control header and won't ask it the second time. 如果您使用302 Found将用户重定向到已签名的URL,则会根据其cache-control标头缓存生成的图像,并且不会再次询问它。

To prevent browser from caching the signed url itself you should send proper Cache-Control header along with it: 为防止浏览器缓存已签名的URL本身,您应该发送适当的Cache-Control标头:

Cache-Control: private, no-cache, no-store, must-revalidate

So the next time it'll send request to the original url and will be redirected to a new signed url. 因此,下次它会向原始网址发送请求,并将重定向到新签名的网址。

You can generate signed url with knox using signedUrl method . 您可以使用signedUrl方法使用knox生成签名的URL。

But don't forget to set proper headers to every uploaded image. 但是不要忘记为每个上传的图像设置正确的标题。 I'd recommend you to use both Cache-Control and Expires headers, because some browser have no support for Cache-Control header and Expires allows you to set only an absolute expiration time. 我建议您同时使用Cache-ControlExpires标头,因为某些浏览器不支持Cache-Control标头, Expires允许您仅设置绝对过期时间。

With the second option (streaming images through your app) you'll have better control over the situation. 使用第二个选项(通过您的应用程序流式传输图像),您可以更好地控制情况。 For example, you'll be able to generate Expires header for each response according to current date and time. 例如,您将能够根据当前日期和时间为每个响应生成Expires标头。

But what about speed? 但速度怎么样? Using signed urls have two advantages which may affect page load speed. 使用签名的URL有两个优点,可能会影响页面加载速度。

First, you won't overload your server. 首先,您不会使服务器过载。 Generating signed urls if fast because you're just hashing your AWS credentials. 如果快速生成签名的URL,因为您只是哈希您的AWS凭据。 And to stream images through your server you'll need to maintain a lot of extra connections during the page load. 要通过服务器流式传输图像,您需要在页面加载期间保持大量额外连接。 Anyway, it won't make any actual difference unless your server is hard loaded. 无论如何,除非您的服务器是硬加载的,否则它不会产生任何实际差异。

Second, browsers keeps only two parallel connections per hostname during page load. 其次,浏览器在页面加载期间每个主机名只保留两个并行连接。 So, browser will keep resolving images urls in parallel while downloading them. 因此,浏览器将在下载时保持并行解析图像网址。 It'll also keep images downloading from blocking downloading of any other resources. 它还可以阻止从任何其他资源的下载下载图像。

Anyway, to be absolutely sure you should run some benchmarks. 无论如何,要绝对确定你应该运行一些基准测试。 My answer was based on my knowledge of HTTP specification and on my experience in web developing, but I never tried to serve images that way myself. 我的回答是基于我对HTTP规范的了解以及我在Web开发方面的经验,但我从未试图以自己的方式提供图像。 Serving public images with long cache lifetime directly from S3 increases page speed, I believe the situation won't change if you'll do it through redirects. 直接从S3提供具有长缓存生命周期的公共图像可以提高页面速度,我相信如果你通过重定向来实现它,情况不会改变。

And you should keep in mind that streaming images through your server will bring all the benefits of Amazon CloudFront to naught. 您应该记住,通过您的服务器流式传输图像将带来Amazon CloudFront的所有好处。 But as long as you're serving content directly from S3 both options will work fine. 但只要您直接从S3提供内容,两个选项都可以正常工作。

Thus, there are two cases when using signed urls should speedup your page: 因此,在使用签名网址时,有两种情况应该加速您的网页:

  • If you have a lot of images on a single page. 如果您在一个页面上有很多图像。
  • If you serving images using CloudFront. 如果您使用CloudFront提供图像。

If you have only few images on each page and serving them directly from S3, you'll probably won't see any difference at all. 如果您在每个页面上只有很少的图像并直接从S3提供它们,您可能根本不会看到任何差异。

Important Update 重要更新

I ran some tests and found that I was wrong about caching. 我运行了一些测试,发现我对缓存有误。 It's true that browsers caches images they was redirected to. 确实,浏览器会缓存重定向到的图像。 But it associates cached image with the url it was redirected to and not with the original one. 但它将缓存的图像与重定向到的URL相关联,而不是与原始图像相关联。 So, when browser loads the page second time it requests image from the server again instead of fetching it from the cache. 因此,当浏览器第二次加载页面时,它再次从服务器请求图像,而不是从缓存中获取图像。 Of course, if server responds with the same redirect url it responded the first time, browser will use its cache, but it's not the case for signed urls. 当然,如果服务器使用相同的重定向URL响应它第一次响应,浏览器将使用其缓存,但签名网址不是这种情况。

I found that forcing browser to cache signed url as well as the data it receives solves the problem. 我发现强制浏览器缓存已签名的URL以及它收到的数据可以解决问题。 But I don't like the idea of caching invalid redirect URL. 但我不喜欢缓存无效重定向URL的想法。 I mean, if browser will miss the image somehow it'll try to request it again using invalid signed url from the cache. 我的意思是,如果浏览器以某种方式错过图像,它将尝试使用缓存中的无效签名URL再次请求它。 So, I think it's not an option. 所以,我认为这不是一个选择。

And it doesn't matter if CloudFront serve images faster or if browsers limits the number of parallel downloads per hostname, the advantage of using browser cache exceeds all the disadvantages of piping images through your server. 如果CloudFront更快地提供图像或浏览器限制每个主机名的并行下载数量并不重要,使用浏览器缓存的优势超出了通过服务器管道图像的所有缺点。

And it looks like most social networks solves the problem with private images by hiding its actual urls behind some private proxies. 看起来大多数社交网络通过将其实际网址隐藏在某些私有代理后面来解决私有图像的问题。 So, they store all their content on public servers, but there is no way to get an url to a private image without authorization. 因此,他们将所有内容存储在公共服务器上,但未经授权就无法获取私有映像的URL。 Of course, if you'll open private image in a new tab and send the url to your friend, he'll be able to see the image too. 当然,如果您在新标签页中打开私人图片并将网址发送给您的朋友,他也可以看到图片。 So, if it's not an option for you then it'll be best for you to use Jonathan Ong's solution . 所以,如果它不适合你,那么你最好使用Jonathan Ong的解决方案

I would be concerned with using the CloudFront option if the photos really do need to remain private. 如果照片真的需要保密,我会担心使用CloudFront选项。 It seems like you'll have a lot more flexibility in administering your own security policy. 您似乎可以更灵活地管理自己的安全策略。 I think the nginx setup may be more complex than is necessary. 我认为nginx设置可能比必要的更复杂。 Express should give you very good performance working as a remote proxy where it uses request to fetch items from S3 and streams them through to authorized users. Express应该作为远程代理提供非常好的性能,它使用请求从S3获取项目并将它们传递给授权用户。 I would highly recommend taking a look at Asset Rack, which uses hash signatures to enable permanent caching in the browser. 我强烈建议您查看Asset Rack,它使用哈希签名在浏览器中启用永久缓存。 You won't be able to use the default Racks because you need to calculate the MD5 of each file (perhaps on upload?) which you can't do when it's streaming. 您将无法使用默认机架,因为您需要计算每个文件的MD5(可能在上传?),这是您在流式传输时无法做到的。 But depending on your application, it could save you a lot of effort for browsers never to need to refetch the images. 但是根据您的应用程序,它可以为您节省大量的工作量,因为浏览器永远不需要重新获取图像。

Regarding your second option, you should be able to set cache control headers directly in S3 . 关于第二个选项,您应该能够直接在S3中设置缓存控制头

Regarding your first option. 关于你的第一个选择。 Have you considered securing your images a different way? 您是否考虑过以不同方式保护图像? When you store an image in S3, couldn't you use a hashed and randomised filename? 在S3中存储图像时,是否不能使用散列和随机文件名? It would be quite straight forward to make the filename difficult to guess + this way you'll have no performance issues viewing the images back. 使文件名难以猜测是非常直接的+这样你就没有查看图像的性能问题了。

This is the technique facebook use. 这是facebook使用的技术。 You can still view an image when you're logged out, as long as you know the URL. 只要您知道URL,您仍然可以在注销时查看图像。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM