简体   繁体   English

如何更新包含5亿条记录的表格?

[英]How to update a table that contains 500 million records?

Table ORIGINAL_ADDRESS contains 500 million records and doesn't have any index on CITY , STREET , or BUILDING . ORIGINAL_ADDRESS包含5亿条记录,并且在CITYSTREETBUILDING上没有任何索引。

It only has an index on address_id . 它只有address_id索引。

Table EXTERNAL_ADDR contains 6000 records and doesn't have an index on CITY , STREET , nor BUILDING . EXTERNAL_ADDR包含6000条记录,并且没有CITYSTREETBUILDING的索引。 We created it for this update. 我们为此次更新创建了它。 We can do with it whatever. 我们可以做任何事情。

How to make the next update? 如何进行下一次更新? Fast! 快速!

MERGE INTO ORIGINAL_ADDRESS
USING EXTERNAL_ADDR ON (ORIGINAL_ADDRESS.CITY = EXTERNAL_ADDR.CITY
        AND ORIGINAL_ADDRESS.STREET = EXTERNAL_ADDR.STREET
        AND ORIGINAL_ADDRESS.BUILDING = EXTERNAL_ADDR.BUILDING)
WHEN MATCHED THEN UPDATE SET
ORIGINAL_ADDRESS.EXT_ID = EXTERNAL_ADDR.ID

We can limit the number of updated records to 22 millions add: 我们可以将更新记录的数量限制为2200万加:

 where the_field_without_index = 'Y'

Try using one of the following 8 steps from below: This is a trial and error situation that only you can do via code. 尝试使用以下8个步骤之一:这是一个只有您可以通过代码执行的反复试验。

1. Explicit Cursor Loop
2. Implicit Cursor Loop
3. UPDATE with nested SET subquery
4. BULK COLLECT / FORALL UPDATE
5. Updateable Join View
6. MERGE
7. Parallel DML MERGE
8. Parallel PL/SQL

Here is an article which explains this best! 这篇文章最能解释这个! http://www.orafaq.com/node/2450 http://www.orafaq.com/node/2450
However if you have the ability to create a new index to the table then that solves your issues. 但是,如果您能够为表创建新索引,则可以解决您的问题。

In the absence of indexes, your best hope would be a full scan of the large table and small table, with a hash join between them. 在没有索引的情况下,您最好的希望是对大表和小表进行全面扫描,并在它们之间进行散列连接。 The cost would be a little more than the cost of the two full table scans plus the cost of changing the values. 成本将略高于两次全表扫描的成本加上更改值的成本。 So, the cost is generally determined by the read bandwidth to storage. 因此,成本通常由读取存储带宽决定。

To improve on that you'd have to add indexes. 要改进,你必须添加索引。

Possibly hash partitioning would help, by reducing the required memory consumption for the join, but indexing would be the first choice because the small table doesn't sound large enough to cause a problem in that respect. 通过减少连接所需的内存消耗可能有助于散列分区,但索引将是首选,因为小表听起来不够大,不会在这方面造成问题。

If starting from scratch, mind, I'd consider adding a calculated hash value on the combination of the join columns and indexing that on the large table. 如果从头开始,请注意,我会考虑在连接列的组合上添加计算的哈希值,并在大型表上建立索引。 It would potentially keep the index smaller. 它可能会使指数保持较小。

If you can't work smarter (indexes), work harder (parallelism): 如果你不能更聪明地工作(索引),那就更加努力(并行):

alter session enable parallel dml;

MERGE /*+ parallel */ INTO ORIGINAL_ADDRESS
USING EXTERNAL_ADDR ON (ORIGINAL_ADDRESS.CITY = EXTERNAL_ADDR.CITY
        AND ORIGINAL_ADDRESS.STREET = EXTERNAL_ADDR.STREET
        AND ORIGINAL_ADDRESS.BUILDING = EXTERNAL_ADDR.BUILDING)
WHEN MATCHED THEN UPDATE SET
ORIGINAL_ADDRESS.EXT_ID = EXTERNAL_ADDR.ID

This can significantly improve performance, assuming you have Enterprise Edition, sufficient resources, a sane configuration, etc. 假设您拥有Enterprise Edition,足够的资源,合理的配置等,这可以显着提高性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM