[英]Replication pipeline to replicate data from MySql RDS to Redshift
My problem is here to create a replication pipeline that replicates tables and data from MySql RDS to Redshift and I cannot use any managed service.我的问题是创建一个复制管道,将表和数据从 MySql RDS 复制到 Redshift,我不能使用任何托管服务。 Also, any new updates in RDS should be replicated in the redshift tables as well.
此外,RDS 中的任何新更新也应复制到红移表中。
After looking at my many solutions, I came to an understanding of the following steps:在查看了我的许多解决方案之后,我了解了以下步骤:
So, I just wanted to confirm if the above approach is fine?所以,我只是想确认上述方法是否可行? As, every time when an update happens, will the old data be deleted completely and replaced by the new or is it possible to just update the necessary records.
因为,每次更新发生时,旧数据是否会被完全删除并被新数据替换,或者是否可以只更新必要的记录。 If yes, then how?
如果是,那么如何?
Any help will be really appreciated.任何帮助将不胜感激。 Thanks in advance.
提前致谢。
Yes, above strategy is not just fine, its good
.是的,上面的策略不仅很好,而且
good
。 I use it in production system
and it works great, though you have to careful and craft this strategy to make sure that it solves your use case effectively
and efficiently
.我在生产
system
中使用它并且效果很好,尽管您必须小心并制定此策略以确保它effectively
且efficiently
地解决您的用例。
Here is few points, what I mean by effectively and efficiently.这里有几点,我所说的有效和高效的意思。
Redshift
, meaning identify the potential records with optimized queries that includes CPU
, Memory
.Redshift
的记录,这意味着使用包括CPU
、 Memory
在内的优化查询来识别潜在记录。redshift
that includes data size optimization, so that it uses minimum storage
and network bandwidth
.redshift
,包括数据大小优化,使其使用最小的storage
和network bandwidth
。 eg compress and gzip
CSV files, so that it takes minimum size in S3
storage and save network
bandwidth.gzip
CSV 文件,使其在S3
存储中占用最小大小并节省network
带宽。copy redshift
queries in a way that it executes in parallel.copy redshift
查询。 Hope this will help.希望这会有所帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.