[英]AWS CLI S3 CP performance is painfully slow
I've got an issue whereby uploads to and downloads from AWS S3 via the aws cli are very slow.我遇到了一个问题,即通过 aws cli 上传到 AWS S3 和从 AWS S3 下载的速度非常慢。 By very slow I mean it consistently takes around 2.3s for a 211k file which indicates an average download speed of less than 500Kb/s which is extremely slow for such a small file.我所说的非常慢是指一个 211k 的文件始终需要大约 2.3 秒,这表明平均下载速度低于 500Kb/s,这对于这么小的文件来说非常慢。 My webapp is heavily reliant on internal APIs and I've narrowed down that the bulk of the API's round-trip performance is predominantly related to uploading and downloading files from S3.我的 webapp 严重依赖于内部 API,我已经缩小了 API 的大部分往返性能主要与从 S3 上传和下载文件相关的范围。
Some details:一些细节:
So to summarise:所以总结一下:
I need to improve AWS CLI S3 download performance because the API is going to be quite heavily used in the future.我需要提高 AWS CLI S3 下载性能,因为 API 将来会被大量使用。
Any suggestions would be gratefully received.如有任何建议,我们将不胜感激。
Okay this was a combination of things.好吧,这是一个组合。
I'd had problems with the AWS PHP API SDK previously (mainly related to orphaned threads when copying files), so had changed my APIs to use the AWS CLI for simplicity and reliability reasons and although they worked, I encountered a few performance issues:我之前在使用 AWS PHP API SDK 时遇到过问题(主要与复制文件时的孤立线程有关),因此出于简单性和可靠性的原因,我已将 API 更改为使用 AWS CLI,尽管它们有效,但我遇到了一些性能问题:
To cut a long story short, I've done two things:长话短说,我做了两件事:
My APIs are now performing much better, ie From 2.3s to an average of around .07s.我的 API 现在表现更好,即从 2.3 秒到平均约 0.07 秒。
This doesn't make my original issue go away but at least performance is much better.这不会使我原来的问题消失,但至少性能要好得多。
I found that if I try to download an object using aws s3 cp
, the download would hang close to finishing when the object size is greater than 500MB.我发现如果我尝试使用aws s3 cp
下载对象,当对象大小大于 500MB 时,下载将挂起接近完成。
However, using get-object
directly causes no hang or slowdown whatsoever.但是,直接使用get-object
不会导致任何挂起或减速。 Therefore instead of using因此,而不是使用
aws s3 cp s3://my-bucket/path/to/my/object .
getting the object with获取对象
aws s3api get-object --bucket my-bucket --key path/to/my/object out-file
I experience no slowdown.我没有减速。
I've got an issue whereby uploads to and downloads from AWS S3 via the aws cli are very slow.我遇到了一个问题,即通过aws cli从AWS S3上进行上传和下载非常慢。 By very slow I mean it consistently takes around 2.3s for a 211k file which indicates an average download speed of less than 500Kb/s which is extremely slow for such a small file.非常慢,我的意思是,一个211k文件始终要花2.3 s左右的时间,这表明平均下载速度低于500Kb / s,对于这么小的文件来说这是非常慢的。 My webapp is heavily reliant on internal APIs and I've narrowed down that the bulk of the API's round-trip performance is predominantly related to uploading and downloading files from S3.我的webapp严重依赖于内部API,并且我缩小了范围,该API的往返性能主要与从S3上传和下载文件有关。
Some details:一些细节:
So to summarise:因此,总结一下:
I need to improve AWS CLI S3 download performance because the API is going to be quite heavily used in the future.我需要提高AWS CLI S3的下载性能,因为该API在将来会被大量使用。
Any suggestions would be gratefully received.任何建议将不胜感激。
AWS S3 is slow and painfully complex and you can't easily search for files. AWS S3 速度缓慢且极其复杂,您无法轻松搜索文件。 If used with cloudfront, it is faster and there are supposed to be advantages, but complexity shifts from very complex to insanely complex because caching obfuscates any file changes, and invalidating the cache is hit and miss unless you change the file name which involves changing the file name in the page referencing that file.如果与 cloudfront 一起使用,它会更快并且应该有优势,但是复杂性从非常复杂转变为非常复杂,因为缓存会混淆任何文件更改,并且使缓存无效是命中和未命中,除非您更改涉及更改文件名的文件名页面中引用该文件的文件名。
In practice, particularly if all or most of your traffic is located in the same region as your load balancer, I have found even a low specced web server located in the same region is faster by factors of 10. If you need multiple web servers attached to a common volume, AWS only provides this in certain regions, so I got around this by using NFS to share the volume on multiple web servers.在实践中,特别是如果您的所有或大部分流量与负载均衡器位于同一区域,我发现即使位于同一区域的低规格 Web 服务器的速度也快 10 倍。如果您需要连接多个 Web 服务器对于通用卷,AWS 仅在某些区域提供此功能,因此我通过使用 NFS 在多个 Web 服务器上共享卷来解决此问题。 This gives you a file system that is mounted on a server you can log in to and list and find files.这为您提供了一个安装在服务器上的文件系统,您可以登录并列出和查找文件。 S3 has become a turnkey solution for a problem that was solved better a couple of decades ago. S3 已经成为解决几十年前更好解决的问题的交钥匙解决方案。
You may try using boto3 to download files instead of aws s3 cp
.您可以尝试使用 boto3 而不是aws s3 cp
下载文件。
Refer to Downloading a File from an S3 Bucket请参阅从 S3 存储桶下载文件
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.