简体   繁体   中英

Redshift data mirroring / mocking

We have a requirement to mirror data from one environment to other in AWS redshift. Would that be possible using snapshots or is there any other way possible?

The mirroring/ replication of data is to test a proof of concept by our vendor and this is between two clusters residing in the same region. The lag part we are not sure and what would be the best approach here, please suggest.

It is essentially from prod to test environment. Regarding one directional, did u meant the data flow. Please suggest any best practices that we can look for in this scenario.

It appears that your requirement is to replicate an Amazon Redshift production cluster to another Amazon Redshift cluster that will be used for testing.

Since test systems rarely need up-to-date production data, such replication would typically be done weekly or, if automated, daily .

It is also best-practice to cleanse the data being replicated to test systems to remove sensitive customer information. This could be done by running a script that clears or obfuscates particular columns (eg credit card numbers).

If you are happy with completely deleting the test system every time it is reloaded from production, the easiest way to replicate would be:

  • Delete the old Test cluster
  • Take a snapshot of the production system
  • Launch a new cluster from the snapshot
  • Run any cleanup scripts against the Test system

The new cluster would initially need to be the same size as the original cluster (eg if production is 3-node, then the new cluster would be 3-node), but you can later resize the test system if you wish to reduce costs.

These steps can easily be automated in a script that uses the AWS Command-Line Interface (CLI) . You could, for example, run it each night so that there is a fresh Test system available each day.

Update: Restoring only a portion of the cluster

Based upon the comment below, it appears that you are using a Redshift cluster for both Dev and Test, as different databases within the cluster.

In this situation, you could use the Restore Table functionality:

  • Delete all tables in the Test database
  • Restore each table individually from the snapshot

This would involve more steps and would not be advisable. It might be simpler to use a separate cluster for Test, so that it can be totally deleted and repopulated.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM