简体繁体中英

Redshift data mirroring / mocking

原文 2018-03-12 03:14:22 8 1 amazon-web-services/ amazon-redshift

We have a requirement to mirror data from one environment to other in AWS redshift. Would that be possible using snapshots or is there any other way possible?

The mirroring/ replication of data is to test a proof of concept by our vendor and this is between two clusters residing in the same region. The lag part we are not sure and what would be the best approach here, please suggest.

It is essentially from prod to test environment. Regarding one directional, did u meant the data flow. Please suggest any best practices that we can look for in this scenario.

1 answers

It appears that your requirement is to replicate an Amazon Redshift production cluster to another Amazon Redshift cluster that will be used for testing.

Since test systems rarely need up-to-date production data, such replication would typically be done weekly or, if automated, daily .

It is also best-practice to cleanse the data being replicated to test systems to remove sensitive customer information. This could be done by running a script that clears or obfuscates particular columns (eg credit card numbers).

If you are happy with completely deleting the test system every time it is reloaded from production, the easiest way to replicate would be:

Delete the old Test cluster
Take a snapshot of the production system
Launch a new cluster from the snapshot
Run any cleanup scripts against the Test system

The new cluster would initially need to be the same size as the original cluster (eg if production is 3-node, then the new cluster would be 3-node), but you can later resize the test system if you wish to reduce costs.

These steps can easily be automated in a script that uses the AWS Command-Line Interface (CLI) . You could, for example, run it each night so that there is a fresh Test system available each day.

Update: Restoring only a portion of the cluster

Based upon the comment below, it appears that you are using a Redshift cluster for both Dev and Test, as different databases within the cluster.

In this situation, you could use the Restore Table functionality:

Delete all tables in the Test database
Restore each table individually from the snapshot

This would involve more steps and would not be advisable. It might be simpler to use a separate cluster for Test, so that it can be totally deleted and repopulated.

Row processing data from Redshift to Redshift

Redshift data storage schema

AWS Redshift Data Processing

Monitor data change in redshift

Filtering data loaded into Redshift

Send MySQL data to Redshift

Pull data from Redshift

Upload data to Redshift with PySpark

Reaching data limit with AWS Redshift

AWS Redshift or RDS for a Data warehouse?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Row processing data from Redshift to Redshift Redshift data storage schema AWS Redshift Data Processing Monitor data change in redshift Filtering data loaded into Redshift Send MySQL data to Redshift Pull data from Redshift Upload data to Redshift with PySpark Reaching data limit with AWS Redshift AWS Redshift or RDS for a Data warehouse?

Related Tags

Redshift data mirroring / mocking

Question

1 answers

solution1 0 2018-03-12 07:23:40

solution1
0 2018-03-12 07:23:40