简体   繁体   English

在Interval Partitioned表中更新5亿行的列值

[英]Update a column value for 500 million rows in Interval Partitioned table

we've a table with 10 Billion rows. 我们有一张10亿行的表。 This table is Interval Partitioned on date . 此表是按日期分区的 In a subpartition we need to update the date for 500 million rows that matches the criteria to a new value. 在子分区中,我们需要将符合条件的5亿行的日期更新为新值。 This will definetly affect creation of new partition or something because the table is partitioned on the same date. 这肯定会影响新分区或其他东西的创建,因为该表在同一日期被分区。 Could anyone give me pointers to a best approach to follow? 谁能指点我最好的方法呢?

Thanks in advance! 提前致谢!

If you are going to update partitioning key and the source rows are in a single (sub)partition, then the reasonable approach would be to: 如果要更新分区键并且源行位于单个(子)分区中,那么合理的方法是:

  1. Create a temporary table for the updated rows. 为更新的行创建临时表。 If possible, perform the update on the fly 如果可能,请立即执行更新

     CREATE TABLE updated_rows AS SELECT add_months(partition_key, 1), other_columns... FROM original_table PARITION (xxx) WHERE ...; 
  2. Drop original (sub)partition 删除原始(子)分区

     ALTER TABLE original_table DROP PARTITION xxx; 
  3. Reinsert the updated rows back 重新插入更新的行

     INSERT /*+append*/ INTO original_table SELECT * FROM updated_rows; 

In case you have issues with CTAS or INSERT INTO SELECT for 500M rows, consider partitioning the temporary table and moving the data in batches. 如果您对500M行有CTAS或INSERT INTO SELECT问题,请考虑对临时表进行分区并批量移动数据。

hmmm... If you have enough space i would create a "copy" of the source table with the good updated rows, then check the results and drop the source table after it, in the end rename the "copy" to the source. 嗯...如果你有足够的空间,我会用好的更新行创建源表的“副本”,然后检查结果并在其后删除源表,最后将“复制”重命名为源。 Yes this have a long executing time, but this could be a painless way, of course parallel hint is needed. 是的,这有一个很长的执行时间,但这可能是一种无痛的方式,当然需要并行提示。

You may consider to add a new column (Flag) 'updated' bit that have by fedault the values NULL (Or 0, i preffer NULL) to your table, and using the criticias of dates that you need to update you can update data group by group in the same way described by Kombajn, once the group of data is updated you can affect the value 1 to the flag 'updated' to your group of data. 您可以考虑添加一个新列(标记)'已更新'位,通过fedault将值NULL(或0,i preffer NULL)添加到您的表中,并使用您需要更新的日期的批评,您可以更新数据组按照Kombajn描述的相同方式按组进行更新,一旦数据组更新,您可以将值1更改为标记为“已更新”的数据组。

For exemple lets start by making groups of datas, let consider that the critecia of groups is the year. 例如,让我们从制作数据组开始,让我们考虑组的标准是年份。 so lets start to treate data year by year. 所以我们开始逐年处理数据。

  1. Create a temporary table of year 1 : 创建第1年的临时表:

CREATE TABLE updated_rows AS SELECT columns... FROM original_table PARITION (2001) WHERE YEAR = 2001 ...;

2.Drop original (sub)partition 2.Drop原始(子)分区

ALTER TABLE original_table DROP PARTITION 2001;

3.Reinsert the updated rows back 3.重新插入更新的行

INSERT /*+append*/ INTO original_table(columns....,updated) SELECT columns...,1 FROM updated_rows;

Hope this will helps you to treat data step by step to prevent waiting all data of the table to be updated in once. 希望这可以帮助您逐步处理数据,以防止等待表的所有数据一次更新。 You may consider a cursor that loop over years. 您可以考虑循环多年的游标。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM