简体繁体 English

将数据从 Amazon Aurora 同步到 Redshift

[英]Sync data from Amazon Aurora to Redshift

原文 2017-06-16 21:53:27 3 3 amazon-web-services/ amazon-redshift/ amazon-kinesis/ amazon-aurora/ amazon-kinesis-firehose

I am trying to setup a sync between AWS Aurora and Redshift.我正在尝试在 AWS Aurora 和 Redshift 之间设置同步。 What is the best way to achieve this sync?实现这种同步的最佳方法是什么？

Possible ways to sync can be: -可能的同步方式可以是：-

Query table to find changes in a table(since I am only doing inserts, updates don't matter), export these changes to a flat file in S3 bucket and use Redshift copy command to insert into Redshift.查询表以查找表中的更改（因为我只进行插入，更新无关紧要），将这些更改导出到 S3 存储桶中的平面文件并使用 Redshift 复制命令插入到 Redshift 中。
Use python publisher and Boto3 to publish changes into a Kinesis stream and then consume this stream in Firehose from where I can copy directly into Redshift.使用python 发布者和 Boto3 将更改发布到 Kinesis stream，然后在 Firehose 中使用这个 stream，我可以从那里直接复制到 Redshift。
Use Kinesis Agent to detect changes in binlog (Is it possible to detect changes int binlog using Kinesis Agent) and publish it to Firehose and from there copy into Firehose.使用 Kinesis Agent 检测 binlog 中的更改（是否可以使用 Kinesis Agent 检测 int binlog 中的更改）并将其发布到 Firehose，然后从那里复制到 Firehose。

I haven't explored AWS Datapipeline yet.我还没有探索 AWS Datapipeline。

3 个解决方案

As pointed out by @Mark B, the AWS Database Migration Service can migrate data between databases.正如@Mark B 所指出的， AWS Database Migration Service可以在数据库之间迁移数据。 This can be done as a one-off exercise, or it can run continuously, keeping two databases in sync.这可以作为一次性练习完成，也可以连续运行，保持两个数据库同步。

The documentation shows that Amazon Aurora can be a source and Amazon Redshift can be a target .文档显示Amazon Aurora 可以是源，而Amazon Redshift 可以是目标。

您还可以使用联合查询： https : //docs.aws.amazon.com/redshift/latest/dg/federated-overview.html

AWS has just announced this new feature: Amazon Aurora zero-ETL integration with Amazon Redshift AWS 刚刚宣布了这项新功能： Amazon Aurora 与 Amazon Redshift 的零 ETL 集成

This natively provides near real-time (second) synchronization from Aurora to Redshift.这在本地提供了从 Aurora 到 Redshift 的近实时（秒）同步。