简体   繁体   English

使用数据管道进行 DynamoDB 备份和恢复。 备份和恢复需要多长时间?

[英]DynamoDB backup and restore using Data pipelines. How long does it take to backup and recover?

I'm planning to use Data pipelines as a backup and recovery tool for our DynamoDB.我计划使用数据管道作为 DynamoDB 的备份和恢复工具。 We will be using amazon's prebuilt pipelines to backup to s3, and use the prebuilt recovery pipeline to recover to a new table in case of a disaster.我们将使用亚马逊的预建管道备份到 s3,并使用预建的恢复管道在发生灾难时恢复到新表。

This will also serve a dual purpose of data archival for legal and compliance reasons.出于法律和合规原因,这也将用于数据存档的双重目的。 We have explored snapshots, but this can get quite expensive compared to s3.我们已经探索过快照,但与 s3 相比,这可能会变得相当昂贵。 Does anyone have an estimate on how long it takes to backup a 1TB database?有人估计备份 1TB 数据库需要多长时间吗? And How long it takes to recover a 1TB database?恢复一个 1TB 的数据库需要多长时间?

I've read amazon docs and it says it can take up to 20 minutes to restore from a snapshot but no mention of how long for a data pipeline.我读过亚马逊文档,它说从快照恢复可能需要 20 分钟,但没有提到数据管道需要多长时间。 Does anyone have any clues?有没有人有任何线索?

Does the newly released feature of exporting from DynamoDB to S3 do what you want for your use case?新发布的从 DynamoDB 导出到 S3的功能是否符合您的用例要求? To use this feature, you must have continuous backups enabled though.要使用此功能,您必须启用连续备份。 Perhaps that will give you the short term backup you need?也许这将为您提供所需的短期备份?

It would be interesting to know why you're not planning to use the built-in backup mechanism.知道您为什么不打算使用内置备份机制会很有趣。 It offers point in time recovery and it is highly predictable in terms of cost and performance.它提供时间点恢复,并且在成本和性能方面具有高度可预测性。

The Data Pipelines backup is unpredictable, will very likely cost more and operationally it is much less reliable.数据管道备份是不可预测的,很可能会花费更多,并且在操作上它的可靠性要低得多。 Plus getting a consistent snapshot (ie point in time) requires stopping the world.加上获得一致的快照(即时间点)需要停止世界。 Speaking from experience, I don't recommend using Data Pipelines for backing up DynamoDB tables!从经验上讲,我不建议使用 Data Pipelines 来备份 DynamoDB 表!

Regarding how long it takes to take a backup, that depends on a number of factors but mostly on the size of the table and the provisioned capacity you're willing to throw at it, as well as the size of the EMR cluster you're willing to work with.关于备份需要多长时间,这取决于许多因素,但主要取决于表的大小和您愿意投入的预置容量,以及您所在的 EMR 集群的大小愿意合作。 So, it could take anywhere from a minute to several hours.因此,它可能需要一分钟到几个小时的时间。

Restoring time also depends on pretty much the same variables: provisioned capacity and total size.恢复时间也取决于几乎相同的变量:预置容量和总大小。 And it can also take anywhere from a minute to many hours.它也可能需要一分钟到几个小时。

Point in time backups offer consistent, predictable and most importantly reliable performance regardless of the size of the table: use that!无论表的大小如何,时间点备份都提供一致、可预测且最重要的是可靠的性能:使用它!

And if you're just interested in dumping the data from the table (ie not necessarily the restore part) use the new export to S3.如果您只是对从表中转储数据感兴趣(即不一定是恢复部分),请使用新的导出到 S3。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM