简体繁体中英

Slow insert and update commands during mysql to redshift replication

原文 2017-11-16 15:22:04 0 1 amazon-redshift/ binlog

I am trying to make a replication server from MySQL to redshift, for this, I am parsing the MySQL binlog. For initial replication, I am taking the dump of the mysql table, converting it into a CSV file and uploading the same to S3 and then I use the redshift copy command. For this the performance is efficient.

After the initial replication, for the continuous sync when I am reading the binlog the inserts and updates have to be run sequentially which are very slow.

Is there anything that can be done for increasing the performance?

One possible solution that I can think of is to wrap the statements in a transaction and then send the transaction at once, to avoid multiple network calls. But that would not address the problem that single update and insert statements in redshift run very slow. A single update statement is taking 6s. Knowing the limitations of redshift (That it is a columnar database and single row insertion will be slow) what can be done to work around those limitations?

Edit 1: Regarding DMS: I want to use redshift as a warehousing solution which just replicates our MYSQL continuously, I don't want to denormalise the data since I have 170+ tables in mysql. During ongoing replication, DMS shows many errors multiple times in a day and fails completely after a day or two and it's very hard to decipher DMS error logs. Also, When I drop and reload tables, it deletes the existing tables on redshift and creates and new table and then starts inserting data which causes downtime in my case. What I wanted was to create a new table and then switch the old one with new one and delete old table

1 answers

Here is what you need to do to get DMS to work

1) create and run a dms task with "migrate and ongoing replication" and "Drop tables on target"

2) this will probably fail, do not worry. "stop" the dms task.

3) on redshift make the following changes to the table

Change all dates and timestamps to varchar (because the options used by dms for redshift copy cannot cope with '00:00:00 00:00' dates that you get in mysql)
change all bool to be varchar - due to a bug in dms.

4) on dms - modify the task to "Truncate" in "Target table preparation mode"

5) restart the dms task - full reload

now - the initial copy and ongoing binlog replication should work.

Make sure you are on latest replication instance software version

Make sure you have followed the instructions here exactly

http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MySQL.html

If your source is aurora, also make sure you have set binlog_checksum to "none" (bad documentation)

Data replication from mysql to Redshift

Redshift UPDATE prohibitively slow

INSERT INTO table SELECT Redshift super slow

Redshift UPDATE uses Seq Scan very slow

How to optimize AWS DMS MySql Aurora to Redshift replication?

What is the best way to automate replication of RDS (MySQL) schema to AWS Redshift?

Replication pipeline to replicate data from MySql RDS to Redshift

Update or Insert based on key columns in Redshift

Redshift JDBC batch insert works slow for multiple rows

redshift/postgresql - How to find duplicates that occur within 1 second during insert?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Data replication from mysql to Redshift Redshift UPDATE prohibitively slow INSERT INTO table SELECT Redshift super slow Redshift UPDATE uses Seq Scan very slow How to optimize AWS DMS MySql Aurora to Redshift replication? What is the best way to automate replication of RDS (MySQL) schema to AWS Redshift? Replication pipeline to replicate data from MySql RDS to Redshift Update or Insert based on key columns in Redshift Redshift JDBC batch insert works slow for multiple rows redshift/postgresql - How to find duplicates that occur within 1 second during insert?

Related Tags

Slow insert and update commands during mysql to redshift replication

Question

1 answers

solution1 0 2017-11-17 10:21:42

solution1
0 2017-11-17 10:21:42