简体   繁体   中英

Slow insert and update commands during mysql to redshift replication

I am trying to make a replication server from MySQL to redshift, for this, I am parsing the MySQL binlog. For initial replication, I am taking the dump of the mysql table, converting it into a CSV file and uploading the same to S3 and then I use the redshift copy command. For this the performance is efficient.

After the initial replication, for the continuous sync when I am reading the binlog the inserts and updates have to be run sequentially which are very slow.

Is there anything that can be done for increasing the performance?

One possible solution that I can think of is to wrap the statements in a transaction and then send the transaction at once, to avoid multiple network calls. But that would not address the problem that single update and insert statements in redshift run very slow. A single update statement is taking 6s. Knowing the limitations of redshift (That it is a columnar database and single row insertion will be slow) what can be done to work around those limitations?


Edit 1: Regarding DMS: I want to use redshift as a warehousing solution which just replicates our MYSQL continuously, I don't want to denormalise the data since I have 170+ tables in mysql. During ongoing replication, DMS shows many errors multiple times in a day and fails completely after a day or two and it's very hard to decipher DMS error logs. Also, When I drop and reload tables, it deletes the existing tables on redshift and creates and new table and then starts inserting data which causes downtime in my case. What I wanted was to create a new table and then switch the old one with new one and delete old table

Here is what you need to do to get DMS to work

1) create and run a dms task with "migrate and ongoing replication" and "Drop tables on target"

2) this will probably fail, do not worry. "stop" the dms task.

3) on redshift make the following changes to the table

  • Change all dates and timestamps to varchar (because the options used by dms for redshift copy cannot cope with '00:00:00 00:00' dates that you get in mysql)
  • change all bool to be varchar - due to a bug in dms.

4) on dms - modify the task to "Truncate" in "Target table preparation mode"

5) restart the dms task - full reload

now - the initial copy and ongoing binlog replication should work.

Make sure you are on latest replication instance software version

Make sure you have followed the instructions here exactly

http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MySQL.html

If your source is aurora, also make sure you have set binlog_checksum to "none" (bad documentation)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM