简体繁体 English

归档 AWS RDS mysql 数据库

[英]Archiving AWS RDS mysql Database

原文 2020-01-29 07:07:14 3 2 mysql/ amazon-web-services/ amazon-s3/ amazon-rds/ amazon-glacier

I am looking for options to archive my old data from specific tables of an AWS RDS MySQL database.我正在寻找从 AWS RDS MySQL 数据库的特定表中存档我的旧数据的选项。 I came across AWS S3, AWS Glacier and copy the data to either one using some Pipelines or Buckets, but from what I understood they copy the data to vault or backups the data, but don't move them.我遇到了 AWS S3、AWS Glacier 并使用一些管道或存储桶将数据复制到其中之一，但据我所知，他们将数据复制到保管库或备份数据，但不移动它们。

Is there a proper option to archive the data by moving from RDS to S3 or Glacier or Deep Archive?是否有通过从 RDS 移动到 S3 或 Glacier 或 Deep Archive 来存档数据的正确选项？ ie, deleting from the table in AWS RDS after creating an archive.即，在创建存档后从 AWS RDS 中的表中删除。 What would be the best option for the archival process with my requirements and would it affect the replicas that already exist?符合我的要求的存档过程的最佳选择是什么，它会影响已经存在的副本吗？

2 个解决方案

The biggest consideration when "archiving" the data is ensuring that it is in a useful format should you every want it back again. “归档”数据时最大的考虑是确保它是一种有用的格式，如果你想要它再次回来。

Amazon RDS recently added that ability to export RDS snapshot data to Amazon S3 . Amazon RDS 最近添加了将 RDS 快照数据导出到 Amazon S3 的功能。

Thus, the flow could be:因此，流程可能是：

Create a snapshot of the Amazon RDS database创建Amazon RDS 数据库的快照
Export the snapshot to Amazon S3 as a Parquet file (you can choose to export specific sets of databases, schemas, or tables)将快照作为 Parquet 文件导出到 Amazon S3（您可以选择导出特定的数据库、架构或表集）
Set the Storage Class on the exported file as desired (eg Glacier Deep Archive)根据需要在导出的文件上设置存储类（例如 Glacier Deep Archive）
Delete the data from the source database (make sure you keep a Snapshot or test the Export before deleting the data!)从源数据库中删除数据（确保在删除数据之前保留快照或测试导出！）

When you later wish to access the data:当您稍后希望访问数据时：

Restore the data if necessary (based upon Storage Class)必要时恢复数据（基于存储类）
Use Amazon Athena to query the data directly from Amazon S3使用 Amazon Athena 直接从 Amazon S3查询数据

Recently I did build a similar pipeline using AWS lambda that runs on a cron schedule(Cloudwatch event) every month to take a manual snapshot of the RDS, export it to S3, and delete the records that are older than n days最近我确实使用 AWS lambda 构建了一个类似的管道，该管道每月按 cron 计划（Cloudwatch 事件）运行，以手动拍摄 RDS 快照，将其导出到 S3，并删除超过 n 天的记录

I added a gist of the util class that I used, adding it here if it helps anyone JS Util class to create and export Db snapshots to S3我添加了我使用的 util 类的要点，如果它可以帮助任何人JS Util 类创建并将 Db 快照导出到 S3，则将其添加到此处

PS: I just wanted to add it as a comment to the approved answer but don't have enough reputations for that. PS：我只是想将其添加为已批准答案的评论，但没有足够的声誉。