简体   繁体   English

在 Redshift 中,是否可以暂停/恢复 VACUUM FULL 作业?

[英]In Redshift, is it possible to pause/resume a VACUUM FULL job?

I'm working with on an exceptionally large table which due to some data issue, I have to re-insert data on a couple of historical dates.我正在处理一个特别大的表,由于某些数据问题,我必须在几个历史日期重新插入数据。 After the insertion, I wanted to perform a manually triggered VACUUM FULL operation.插入后,我想执行一个手动触发的 VACUUM FULL 操作。 However, unfortunately, the VACUUM FULL operation on that table takes more than several days to complete.然而,不幸的是,对该表的 VACUUM FULL 操作需要几天以上的时间才能完成。 Since, in Redshift, only one VACUUM operation can happen at a time, that also means that other smaller tables will not be able to perform their daily VACUUM operation until that large table is done with its VACUUM operation.因为在 Redshift 中,一次只能发生一个 VACUUM 操作,这也意味着其他较小的表将无法执行它们的日常 VACUUM 操作,直到那个大表完成其 VACUUM 操作。

My question is, is there a way to pause a VACUUM operation on that large table to give some room for VACUUM-ing the smaller tables?我的问题是,有没有办法暂停对该大表的 VACUUM 操作,以便为较小表的 VACUUM 留出一些空间? Will terminating a VACUUM operation resets the operation or will re-running the VACUUM command able to resume the operation from the last successful state?终止 VACUUM 操作会重置操作还是重新运行 VACUUM 命令能够从上次成功的 state 恢复操作?

Sorry, I'm trying to learn more about how the VACUUM process works in Redshift but I am not able to find too much info on it.抱歉,我正在尝试了解有关 VACUUM 过程在 Redshift 中如何工作的更多信息,但我找不到太多关于它的信息。 Would be really appreciated to have some explanation/docs in your answer as well.如果您的回答中也有一些解释/文档,我们将不胜感激。

Note: I did tried to perform a deep copy as mentioned in the official docs .注意:我确实尝试过执行官方文档中提到的深拷贝。 However, the table is too large to copy on one go so it's not an option.但是,该表太大而无法在一个 go 上复制,因此这不是一种选择。

Thanks!谢谢!

As far as I know there is no way to pause a (manual) vacuum.据我所知,没有办法暂停(手动)吸尘器。 However, vacuum runs in "passes" where some vacuum work is done and partial results are committed.然而,vacuum 在“passes”中运行,其中完成了一些 vacuum 工作并提交了部分结果。 This allows for ongoing work to progress while the vacuum is running.这允许在真空运行时进行正在进行的工作。 If you terminate the vacuum midway the previously committed blocks will save the partial work.如果您中途终止真空,先前提交的块将保存部分工作。 Some work will be lost for the current pass and the restarted vacuum will need to scan the entire table to figure out where to start.当前传递的一些工作将丢失,重新启动的 vacuum 将需要扫描整个表以找出从哪里开始。 Last I knew this will work but you will lose some progress with each terminate.最后我知道这会起作用,但每次终止都会失去一些进展。

If you manage the update / consistency issues a deep copy can be a faster way to go. You don't have to do this in a single pass - you can do it in parts.如果您管理更新/一致性问题,则深度复制可以更快地达到 go。您不必一次完成此操作 - 您可以分批完成。 You need the space to store the second version of the table but you won't need the space to sort the whole table in one go. For example if you have a table, let's say with 10 years of data, you can insert the first year into the new table (sorted of course).您需要空间来存储表格的第二个版本,但不需要空间将整个表格排序为一个 go。例如,如果您有一个表格,假设有 10 年的数据,您可以插入第一个年进入新表(当然排序)。 Then the second and so on.然后是第二个等等。 There may be some partial empty blocks at the boundaries but these are easy to fix up with a delete only vacuum (or just wait for auto-vacuum to do it).边界处可能有一些部分空块,但这些很容易通过仅删除真空来修复(或者等待自动真空来完成)。

If you can do it the deep copy method will be faster as it doesn't need to keep consistency or play nice with other workloads.如果你能做到,深度复制方法会更快,因为它不需要保持一致性或与其他工作负载配合得很好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM