简体   繁体   English

使用现有数据在 Redshift 集群上启用加密

[英]Enabling Encryption on a Redshift Cluster with existing data

I've been charged with enabling encryption on a Redshift cluster which has a significant amount of existing data.我负责在拥有大量现有数据的 Redshift 集群上启用加密。 Based on this link I know that when enabled it will create a new cluster and copy the existing data across making access to it during this time readonly.基于此链接,我知道启用后,它将创建一个新集群并复制现有数据,以便在这段时间内以只读方式访问它。 We have a number of ETL jobs that run against the Redshift cluster and I'm trying to determine how long roughly I can expect the migration to take.我们有许多针对 Redshift 集群运行的 ETL 作业,我正在尝试确定大概需要多长时间进行迁移。 Is there any kind of estimation available based on data size/node type/cluster config?是否有任何基于数据大小/节点类型/集群配置的估算可用?

Is there any kind of estimation available based on data size/node type/cluster config? 是否有基于数据大小/节点类型/集群配置的估计?

Basically, no. 基本上没有 The amount of time this takes will depend on a number of factors some of which are outside your control so it's very hard to predict. 花费的时间将取决于许多因素,其中一些是您无法控制的,因此很难预测。

You should absolutely test this first so you understand the implications and how long it's likely to take, eg 您应该首先进行绝对测试,以便了解其含义以及可能需要花费的时间,例如

  • Create a new, identical cluster by restoring a snapshot of your original cluster 通过还原原始群集的快照来创建新的相同群集
  • Follow the steps to encrypt the cluster and record the time taken 请按照以下步骤对集群进行加密并记录所花费的时间
  • Ideally, test your existing ETL jobs with the encrypted cluster 理想情况下,使用加密集群测试您现有的ETL作业
  • Drop the test cluster 删除测试集群

Based on my experience with resizing clusters (a similar but not identical exercise) I would allow +/- 10-15% margin on your test time due to variability in the local AWS resources, network traffic etc. 根据我调整群集大小的经验(类似但不同的练习),由于本地AWS资源,网络流量等的可变性,我会允许您的测试时间有+/- 10-15%的余量。

If it's possible, I'd advise killing all connections to the cluster to speed up the process. 如果可能的话,我建议终止与群集的所有连接,以加快该过程。 We discovered a process that frequently polled our cluster caused the resize process to take longer. 我们发现一个频繁轮询集群的过程导致调整大小过程花费了更长的时间。

For a reference point, a 20 node ds cluster with approx. 对于参考点,大约20个节点的ds群集。 25 Tb of data took around 20 hours to resize. 25 Tb的数据大约需要20个小时来调整大小。

Enabling encryption on non encrypted cluster take lot of time Foe example :-2 tb cluster takes 50 hour and we can not bare this much hold for our etl job在非加密集群上启用加密需要很多时间,例如:-2 tb 集群需要 50 小时,我们不能为我们的 etl 工作保留这么多时间

Do we have any way other way to enable encryption?我们还有其他方法可以启用加密吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM