[英]Data replication from mysql to Redshift
I would like to load the data from mysql to redshift . 我想将数据从mysql加载到redshift 。
Here my data values can change at anytime. 在这里,我的数据值可以随时更改。 So I need to capture old records and new records as well into Redshift.
因此,我还需要将旧记录和新记录捕获到Redshift中。
Here modified records need to be archive.Only new records reflect in Redshift. 这里修改过的记录需要存档。只有新记录会反映在Redshift中。
For an example 举个例子
MysqlTable : MysqlTable的:
ID NAME SAL
-- ---- -----
1 XYZ 10000
2 ABC 20000
For first load into Redshift(this should be same as Mysqltable) 第一次加载到Redshift中(这应该与Mysqltable相同)
ID NAME SAL
-- ---- ----
1 XYZ 10000
2 ABC 20000
for Second load(I changed salary of Employee 'XYZ' from 10000 to 30000 ) 第二次加载(我将员工“ XYZ”的薪水从10000更改为30000)
ID NAME SAL
-- ---- ----
1 XYZ 30000
2 ABC 20000
The above table should be reflect in Redshift and modified record (1 XYZ 10000) should be archive. 上表应反映在Redshift中,并应将修改的记录(1 XYZ 10000)存档。
Is this possible? 这可能吗?
How many rows are you expecting? 您期望多少行?
One approach would be to add a timestamp column which gets updated to current time whenever a record is modified. 一种方法是添加一个timestamp列,无论何时修改记录,该列都会更新为当前时间。
Then with an external process doing a replication run, you could get the max timestamp from Redshift and select any records from MySQL that are greater than that timestamp and, if you use the COPY method to load into Redshift, dump them to S3. 然后,使用外部进程进行复制运行,您可以从Redshift获取最大时间戳,并从MySQL选择大于该时间戳的任何记录,如果使用COPY方法加载到Redshift,则将它们转储到S3。
To load new records and archive old you'll need to use a variation of a Redshift upsert pattern . 要加载新记录并归档旧记录,您需要使用Redshift upsert模式的变体。 This would involve loading to a temporary table, identifying records in the original table to be archived, moving those to another archive table or UNLOADing them to an S3 archive, and then ALTER APPEND the new records into the main table.
这将涉及加载到临时表,标识要归档的原始表中的记录,将这些记录移动到另一个归档表或将它们卸载到S3归档,然后将新记录更改为追加到主表中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.