简体   繁体   中英

Data replication from mysql to Redshift

I would like to load the data from mysql to redshift .

Here my data values can change at anytime. So I need to capture old records and new records as well into Redshift.

Here modified records need to be archive.Only new records reflect in Redshift.

For an example

MysqlTable :

ID    NAME    SAL
--    ----    -----
1     XYZ     10000
2     ABC     20000

For first load into Redshift(this should be same as Mysqltable)

ID       NAME     SAL
--       ----     ----
1        XYZ      10000
2        ABC      20000

for Second load(I changed salary of Employee 'XYZ' from 10000 to 30000 )

ID      NAME       SAL
--      ----       ----
1       XYZ        30000
2       ABC        20000

The above table should be reflect in Redshift and modified record (1 XYZ 10000) should be archive.

Is this possible?

How many rows are you expecting?

One approach would be to add a timestamp column which gets updated to current time whenever a record is modified.

Then with an external process doing a replication run, you could get the max timestamp from Redshift and select any records from MySQL that are greater than that timestamp and, if you use the COPY method to load into Redshift, dump them to S3.

To load new records and archive old you'll need to use a variation of a Redshift upsert pattern . This would involve loading to a temporary table, identifying records in the original table to be archived, moving those to another archive table or UNLOADing them to an S3 archive, and then ALTER APPEND the new records into the main table.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM