简体   繁体   English

如何优化 AWS DMS MySql Aurora 到 Redshift 复制?

[英]How to optimize AWS DMS MySql Aurora to Redshift replication?

I've been using AWS DMS to perform ongoing replication from MySql Aurora to Redshift.我一直在使用 AWS DMS 执行从 MySql Aurora 到 Redshift 的持续复制。 However, the ongoing replication is causing constant 25-30% CPU load on the target.但是,正在进行的复制在目标上造成恒定 25-30% 的 CPU 负载。 This is because it produces many small files on S3 and loads/processes them non-stop.这是因为它会在 S3 上生成许多小文件并不间断地加载/处理它们。 Redshift is not really designed for handling large number of small tasks. Redshift 并不是真正为处理大量小任务而设计的。

In order to optimize, i've made it so that the process starts at the beginning of each hour, waits till the target is in-sync, and then stops.为了优化,我已经做到了这个过程在每小时开始时开始,等到目标同步,然后停止。 So, instead of working continually, it works for 5-8 minutes at the beginning of each hour.因此,它不是连续工作,而是在每小时开始时工作 5-8 分钟。 Even so, it is still very slow and unoptimized because it still has to process hundreds of small s3 files, only in shorter timespan.即便如此,它仍然非常缓慢且未优化,因为它仍然需要处理数百个小型 s3 文件,而且时间跨度更短。

Can this be optimized further?这可以进一步优化吗? Is there a way to tell DMS to buffer these changes for larger period of time, and not produce fewer larger instead of many small s3 files?有没有办法告诉 DMS 在更长的时间内缓冲这些更改,而不是生成更少的更大的文件而不是许多小的 s3 文件? We really don't mind having higher target latency.我们真的不介意有更高的目标延迟。

The amount of data transferred between Aurora and Redshift is rather small. Aurora 和 Redshift 之间传输的数据量相当小。 There are around ~20K changes per hour, and we're using 4-node dc1.large redshift cluster.每小时大约有 2 万次更改,我们使用的是 4 节点 dc1.large redshift 集群。 It should be able to handle those 20K changes in matter of seconds, not minutes它应该能够在几秒钟而不是几分钟内处理那些 20K 的变化

maybe, you can try BatchApplyTimeoutMin and BatchApplyTimeoutMax.也许,您可以尝试 BatchApplyTimeoutMin 和 BatchApplyTimeoutMax。 https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TaskSettings.ChangeProcessingTuning.html https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TaskSettings.ChangeProcessingTuning.html

BatchApplyTimeoutMin sets the minimum amount of time in seconds that AWS DMS waits between each application of batch changes. BatchApplyTimeoutMin 设置 AWS DMS 在每个批量更改应用程序之间等待的最短时间(以秒为单位)。 The default value is 1.默认值为 1。

You can change the value to 1200, even 3600.您可以将值更改为 1200,甚至 3600。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM