简体繁体 English

在mysql期间缓慢插入和更新命令以进行红移复制

[英]Slow insert and update commands during mysql to redshift replication

原文 2017-11-16 15:22:04 7 1 amazon-redshift/ binlog

I am trying to make a replication server from MySQL to redshift, for this, I am parsing the MySQL binlog. 我正在尝试将复制服务器从MySQL转换为redshift，为此，我正在解析MySQL binlog。 For initial replication, I am taking the dump of the mysql table, converting it into a CSV file and uploading the same to S3 and then I use the redshift copy command. 对于初始复制，我正在进行mysql表的转储，将其转换为CSV文件并将其上传到S3，然后使用redshift copy命令。 For this the performance is efficient. 为此，性能是有效的。

After the initial replication, for the continuous sync when I am reading the binlog the inserts and updates have to be run sequentially which are very slow. 在初始复制之后，对于我在读取binlog时的连续同步，必须按顺序运行插入和更新，这非常慢。

Is there anything that can be done for increasing the performance? 有什么可以提高性能吗？

One possible solution that I can think of is to wrap the statements in a transaction and then send the transaction at once, to avoid multiple network calls. 我能想到的一个可能的解决方案是将语句包装在事务中，然后立即发送事务，以避免多个网络调用。 But that would not address the problem that single update and insert statements in redshift run very slow. 但这并不能解决redshift中单个更新和插入语句运行速度非常慢的问题。 A single update statement is taking 6s. 单个更新语句需要6秒。 Knowing the limitations of redshift (That it is a columnar database and single row insertion will be slow) what can be done to work around those limitations? 知道redshift的局限性（它是一个柱状数据库和单行插入会很慢）可以做些什么来解决这些限制？

Edit 1: Regarding DMS: I want to use redshift as a warehousing solution which just replicates our MYSQL continuously, I don't want to denormalise the data since I have 170+ tables in mysql. 编辑1：关于DMS：我想使用redshift作为仓库解决方案，它只是连续复制我们的MYSQL，我不想对数据进行非规范化，因为我在mysql中有170多个表。 During ongoing replication, DMS shows many errors multiple times in a day and fails completely after a day or two and it's very hard to decipher DMS error logs. 在进行复制期间，DMS在一天内多次显示许多错误，并在一两天后完全失败，并且很难解密DMS错误日志。 Also, When I drop and reload tables, it deletes the existing tables on redshift and creates and new table and then starts inserting data which causes downtime in my case. 此外，当我删除并重新加载表时，它会删除redshift上的现有表并创建新表，然后开始插入导致我的情况下停机的数据。 What I wanted was to create a new table and then switch the old one with new one and delete old table 我想要的是创建一个新表，然后用新表切换旧表并删除旧表

1 个解决方案

Here is what you need to do to get DMS to work 以下是使DMS工作所需的操作

1) create and run a dms task with "migrate and ongoing replication" and "Drop tables on target" 1）使用“迁移和持续复制”和“在目标上删除表”创建并运行dms任务

2) this will probably fail, do not worry. 2）这可能会失败，不用担心。 "stop" the dms task. “停止”dms任务。

3) on redshift make the following changes to the table 3）在redshift上对表格进行以下更改

Change all dates and timestamps to varchar (because the options used by dms for redshift copy cannot cope with '00:00:00 00:00' dates that you get in mysql) 将所有日期和时间戳更改为varchar（因为dms用于redshift副本的选项无法处理您在mysql中获得的'00：00：00 00:00'日期）
change all bool to be varchar - due to a bug in dms. 将所有bool更改为varchar - 由于dms中的错误。

4) on dms - modify the task to "Truncate" in "Target table preparation mode" 4）在dms上 - 在“目标表准备模式”中将任务修改为“截断”

5) restart the dms task - full reload 5）重启dms任务 - 完全重载

now - the initial copy and ongoing binlog replication should work. 现在 - 初始副本和正在进行的binlog复制应该可行。

Make sure you are on latest replication instance software version 确保您使用的是最新的复制实例软件版本

Make sure you have followed the instructions here exactly 确保您已完全按照此处的说明进行操作

http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MySQL.html http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MySQL.html

If your source is aurora, also make sure you have set binlog_checksum to "none" (bad documentation) 如果您的源是极光，还要确保已将binlog_checksum设置为“none”（错误的文档）