简体繁体 English

给定一个archive_id，我该如何将存档从AWS Glacier移至S3存储桶？

[英]Given an archive_id, how might I go about moving an archive from AWS Glacier to an S3 Bucket?

原文 2014-02-12 13:16:50 3 2 python/ amazon-web-services/ amazon-s3/ boto/ amazon-glacier

I have written an archival system with Python Boto that tar's several dirs of files and uploads to Glacier. 我已经用Python Boto编写了一个档案系统，该档案系统包含tar的几个文件目录并上传到Glacier。 This is all working great and I am storing all of the archive ID's. 这一切都很好，我正在存储所有档案ID。

I wanted to test downloading a large archive (about 120GB). 我想测试下载大型存档（约120GB）。 I initiated the retrieval, but the download took > 24 hours and at the end, I got a 403 since the resource was no longer available and the download failed. 我启动了检索，但是下载耗时超过24小时，最后，由于资源不再可用并且下载失败，我得到了403。

If I archived straight from my server to Glacier (skipping S3), is it possible to initiate a restore that restores an archive to an S3 bucket so I can take longer than 24 hours to download a copy? 如果我从服务器直接存档到Glacier（跳过S3），是否可以启动将存档还原到S3存储桶的还原，因此我可能需要超过24小时才能下载副本？ I didn't see anything in either the S3 or Glacier Boto docs. 在S3或Glacier Boto文档中都没有看到任何内容。

Ideally I'd do this with Boto but would be open to other scriptable options. 理想情况下，我会使用Boto进行此操作，但可以使用其他可脚本化的选项。 Does anyone know how given an archiveId, I might go about moving an archive from AWS Glacier to an S3 Bucket? 有谁知道给定的archiveId，我可能会把存档从AWS Glacier移到S3存储桶？ If this is not possible, are there other options to give my self more time to download large files? 如果无法做到这一点，还有其他选择可以给我自己更多的时间来下载大文件吗？

Thanks! 谢谢！

http://docs.pythonboto.org/en/latest/ref/glacier.html http://docs.pythonboto.org/en/latest/ref/s3.html http://docs.pythonboto.org/en/latest/ref/glacier.html http://docs.pythonboto.org/en/latest/ref/s3.html

2 个解决方案

The direct Glacier API and the S3/Glacier integration are not connected to each other in a way that is accessible to AWS users. 直接的Glacier API和S3 / Glacier集成没有以AWS用户可以访问的方式相互连接。

If you upload directly to Glacier, the only way to get the data back is to fetch it back directly from Glacier. 如果直接上传到Glacier，则取回数据的唯一方法是直接从Glacier取回数据。

Conversely, if you add content to Glacier via S3 lifecycle policies, then there is no exposed Glacier archive ID, and the only way to get the content is to do an S3 restore. 相反，如果您通过S3生命周期策略将内容添加到Glacier，则没有公开的Glacier存档ID，获取内容的唯一方法是执行S3还原。

It's essentially as if "you" aren't the Glacier customer, but rather "S3" is the Glacier customer, when you use the Glacier/S3 integration. 当您使用Glacier / S3集成时，就好像“您”不是Glacier客户，而“ S3”是Glacier客户一样。 (In fact, that's a pretty good mental model -- the Glacier storage charges are even billed differently -- files stored through the S3 integration are billed together with the other S3 charges on the monthly invoice, not with the Glacier charges). （实际上，这是一个很好的心理模型-Glacier的存储费用甚至以不同的方式计费-通过S3集成存储的文件与月度发票上的其他S3费用一起计费，而不是Glacier费用）。

The way to accomplish what you are directly trying to accomplish is to do range retrievals , where you only request that Glacier restore a portion of the archive. 完成您直接尝试完成的工作的方法是进行范围检索，您只要求Glacier还原部分存档。

Another reason you could choose to perform a range retrieval is to manage how much data you download from Amazon Glacier in a given period. 您可以选择执行范围检索的另一个原因是管理在给定期间内从Amazon Glacier下载的数据量。 When data is retrieved from Amazon Glacier, a retrieval job is first initiated, which will typically complete in 3-5 hours. 从Amazon Glacier检索数据时，将首先启动检索作业，通常将在3-5个小时内完成。 The data retrieved is then available for download for 24 hours. 然后，检索到的数据将在24小时内可供下载。 You could therefore retrieve an archive in parts in order to manage the schedule of your downloads. 因此，您可以分批检索档案，以管理下载的时间表。 You may also choose to perform range retrievals in order to reduce or eliminate your retrieval fees. 您也可以选择执行范围检索，以减少或消除检索费用。

^{— http://aws.amazon.com/glacier/faqs/} ^{— http://aws.amazon.com/glacier/faqs/}

You'd then need to reassemble the pieces. 然后，您需要重新组装。 That last part seems like a big advantage also, since Glacier does charge more, the more data you "restore" at a time. 最后一部分似乎也是一个很大的优势，因为Glacier确实收取更多费用，因此您一次“恢复”的数据更多。 Note this isn't a charge for downloading the data, it's a charge for the restore operation, whether you download it or not. 请注意，无论您是否下载数据，这都不是下载数据的费用，而是还原操作的费用。

One advantage I see of the S3 integration is that you can leave your data "cooling off" in S3 for a few hours/days/weeks before you put it "on ice" in Glacier, which happens automatically... so you can fetch it back from S3 without paying a retrieval charge, until it's been sitting in S3 for the amount of time you've specified, after which it automatically migrates. 我看到的S3集成的一个优势是，您可以将数据在S3中“冷却”几个小时/天/周，然后再将其“固定”在Glacier中，这会自动发生……因此您可以获取它从S3退回而无需支付检索费，直到它在S3中放置了指定的时间，然后它会自动迁移。 The potential downside is that it seems to introduce more moving parts. 潜在的不利之处在于，它似乎引入了更多的运动部件。

Using document lifecycle policies you can move files directly from S3 to Glacier and you can also restore those object back to S3 using the restore method of the boto.s3.Key object. 使用文档生命周期策略，您可以将文件直接从S3移至Glacier，也可以使用boto.s3.Key对象的restore方法将这些对象还原回S3。 Also, see this section of the S3 docs for more information on how restore works. 另外，请参阅S3文档的本节，以获取有关还原工作原理的更多信息。