简体   繁体   English

从mysql到Redshift的数据复制

[英]Data replication from mysql to Redshift

I would like to load the data from mysql to redshift . 我想将数据从mysql加载到redshift

Here my data values can change at anytime. 在这里,我的数据值可以随时更改。 So I need to capture old records and new records as well into Redshift. 因此,我还需要将旧记录和新记录捕获到Redshift中。

Here modified records need to be archive.Only new records reflect in Redshift. 这里修改过的记录需要存档。只有新记录会反映在Redshift中。

For an example 举个例子

MysqlTable : MysqlTable的:

ID    NAME    SAL
--    ----    -----
1     XYZ     10000
2     ABC     20000

For first load into Redshift(this should be same as Mysqltable) 第一次加载到Redshift中(这应该与Mysqltable相同)

ID       NAME     SAL
--       ----     ----
1        XYZ      10000
2        ABC      20000

for Second load(I changed salary of Employee 'XYZ' from 10000 to 30000 ) 第二次加载(我将员工“ XYZ”的薪水从10000更改为30000)

ID      NAME       SAL
--      ----       ----
1       XYZ        30000
2       ABC        20000

The above table should be reflect in Redshift and modified record (1 XYZ 10000) should be archive. 上表应反映在Redshift中,并应将修改的记录(1 XYZ 10000)存档。

Is this possible? 这可能吗?

How many rows are you expecting? 您期望多少行?

One approach would be to add a timestamp column which gets updated to current time whenever a record is modified. 一种方法是添加一个timestamp列,无论何时修改记录,该列都会更新为当前时间。

Then with an external process doing a replication run, you could get the max timestamp from Redshift and select any records from MySQL that are greater than that timestamp and, if you use the COPY method to load into Redshift, dump them to S3. 然后,使用外部进程进行复制运行,您可以从Redshift获取最大时间戳,并从MySQL选择大于该时间戳的任何记录,如果使用COPY方法加载到Redshift,则将它们转储到S3。

To load new records and archive old you'll need to use a variation of a Redshift upsert pattern . 要加载新记录并归档旧记录,您需要使用Redshift upsert模式的变体。 This would involve loading to a temporary table, identifying records in the original table to be archived, moving those to another archive table or UNLOADing them to an S3 archive, and then ALTER APPEND the new records into the main table. 这将涉及加载到临时表,标识要归档的原始表中的记录,将这些记录移动到另一个归档表或它们卸载到S3归档,然后将新记录更改为追加到主表中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用于将数据从 MySql RDS 复制到 Redshift 的复制管道 - Replication pipeline to replicate data from MySql RDS to Redshift 将数据从Redshift复制到MySQL / PSQL - Copy data from Redshift To MySQL/PSQL 将数据从MySql同步到Amazon RedShift - Synchronize data from MySql to Amazon RedShift 如果同步到 Redshift/BigQuery,来自 MySQL 数据库的 bin 日志复制是否会保持唯一约束? - Will bin log replication from a MySQL database maintain unique constraints if synched to Redshift/BigQuery? 将MySQL数据发送到Redshift - Send MySQL data to Redshift 使用 AWS DMS 将数据从 RDS Postgres(只读副本)复制到 Redshift 时出现复制槽错误 - Replication Slots error while replicating data from RDS Postgres(read replica) to Redshift using AWS DMS 如何优化 AWS DMS MySql Aurora 到 Redshift 复制? - How to optimize AWS DMS MySql Aurora to Redshift replication? 自动将RDS(MySQL)模式复制到AWS Redshift的最佳方法是什么? - What is the best way to automate replication of RDS (MySQL) schema to AWS Redshift? 在mysql期间缓慢插入和更新命令以进行红移复制 - Slow insert and update commands during mysql to redshift replication Amazon Redshift - 复制 - 数据加载与查询性能问题 - Amazon Redshift - Replication - Data load Vs Query Performance Issues
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM