简体   繁体   English

如何以经济高效的方式将日常构建存储在Amazon S3中?

[英]How to store daily builds in Amazon S3 cost-effectively?

I'm trying to make a daily build machine using EC2 and store the daily releases in S3. 我正在尝试使用EC2制作每日构建机器,并将每日发行版存储在S3中。

The releases are complete disk images so they are very bloated(300+MB total, 95% OS kernel/RFS/libraries, 5% actual software). 这些发行版是完整的磁盘映像,因此非常膨胀(总计300 + MB,95%的OS内核/ RFS /库,5%的实际软件)。 And they change very little across time. 它们随着时间的变化很小。

Ideally, with good compression, the storage cost should be close to O(t) , t for time. 理想情况下,压缩效果好,时间的存储成本应接近O(t)t

But if I simply add those files to S3 every day, with version number as part of file name, or with the same file name each time but with the S3 bucket versioned, the cost would be O(t^2) . 但是,如果我每天只是简单地将这些文件添加到S3中,而版本号作为文件名的一部分,或者每次都使用相同的文件名,但对S3存储桶进行版本化,则成本将为O(t^2)

Because according to this , all versions takes space and I'm charged for the space a new version takes ever since a new version is created. 因为根据这个 ,所有版本需要空间,我收费,因为创建一个新的版本,新版本采用以往的空间。

Glacier is cheaper but still O(t^2) . 冰川便宜,但仍然O(t^2)

Any suggestions? 有什么建议么?

Basically what you're looking for is an incremental file-level backup. 基本上,您正在寻找的是增量文件级备份。 (ie only backup things that change) and rebuild the current state by using a full backup and applying the deltas (ie increments). (即仅备份发生变化的事物)并通过使用完全备份并应用增量(即增量)来重建当前状态。

If you need to use the latest image you probably need to do incremental + keep latest image. 如果您需要使用最新的图像,则可能需要增量+保留最新的图像。 You also probably want to do full backups from time to time to reduce the time it takes to rebuild from incremental (and you are going to need to keep some sort of metadata associated with the backups). 您可能还希望不时进行完整备份,以减少从增量式重建所花费的时间(并且您将需要保留与备份相关联的某种元数据)。

So to sum it up: what you are describing is possible, you just need to do extra work apart from just pushing the image. 综上所述:您所描述的是可能的,除了推动图像之外,您还需要做额外的工作。 Presumably you have a build process that generates the image an the extra steps can be inserted between generation and upload. 大概您有一个生成图像的生成过程,可以在生成和上传之间插入额外的步骤。 The restore process is going to be more complicated than currently. 恢复过程将比当前更加复杂。

To get you started look at binary diff tools like bsdiff/bspatch or xdelta. 首先,请看一下bsdiff / bspatch或xdelta之类的二进制diff工具。 You could generate the delta and back up only the delta. 您可以生成增量并仅备份增量。 The image is also compressed so if you diff the compressed versions you will not get very far, so you probably want to diff the uncompressed file. 该映像也已压缩,因此如果您比较压缩版本,则不会太远,因此您可能想要比较未压缩的文件。 Another way to look at it is to do the diff before generating an image and picking up only files that changed (probably more complex) 另一种查看方式是在生成图像之前仅进行差异分析,并仅选择已更改的文件(可能更复杂)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM