简体   繁体   中英

How to update a table that contains 500 million records?

Table ORIGINAL_ADDRESS contains 500 million records and doesn't have any index on CITY , STREET , or BUILDING .

It only has an index on address_id .

Table EXTERNAL_ADDR contains 6000 records and doesn't have an index on CITY , STREET , nor BUILDING . We created it for this update. We can do with it whatever.

How to make the next update? Fast!

MERGE INTO ORIGINAL_ADDRESS
USING EXTERNAL_ADDR ON (ORIGINAL_ADDRESS.CITY = EXTERNAL_ADDR.CITY
        AND ORIGINAL_ADDRESS.STREET = EXTERNAL_ADDR.STREET
        AND ORIGINAL_ADDRESS.BUILDING = EXTERNAL_ADDR.BUILDING)
WHEN MATCHED THEN UPDATE SET
ORIGINAL_ADDRESS.EXT_ID = EXTERNAL_ADDR.ID

We can limit the number of updated records to 22 millions add:

 where the_field_without_index = 'Y'

Try using one of the following 8 steps from below: This is a trial and error situation that only you can do via code.

1. Explicit Cursor Loop
2. Implicit Cursor Loop
3. UPDATE with nested SET subquery
4. BULK COLLECT / FORALL UPDATE
5. Updateable Join View
6. MERGE
7. Parallel DML MERGE
8. Parallel PL/SQL

Here is an article which explains this best! http://www.orafaq.com/node/2450
However if you have the ability to create a new index to the table then that solves your issues.

In the absence of indexes, your best hope would be a full scan of the large table and small table, with a hash join between them. The cost would be a little more than the cost of the two full table scans plus the cost of changing the values. So, the cost is generally determined by the read bandwidth to storage.

To improve on that you'd have to add indexes.

Possibly hash partitioning would help, by reducing the required memory consumption for the join, but indexing would be the first choice because the small table doesn't sound large enough to cause a problem in that respect.

If starting from scratch, mind, I'd consider adding a calculated hash value on the combination of the join columns and indexing that on the large table. It would potentially keep the index smaller.

If you can't work smarter (indexes), work harder (parallelism):

alter session enable parallel dml;

MERGE /*+ parallel */ INTO ORIGINAL_ADDRESS
USING EXTERNAL_ADDR ON (ORIGINAL_ADDRESS.CITY = EXTERNAL_ADDR.CITY
        AND ORIGINAL_ADDRESS.STREET = EXTERNAL_ADDR.STREET
        AND ORIGINAL_ADDRESS.BUILDING = EXTERNAL_ADDR.BUILDING)
WHEN MATCHED THEN UPDATE SET
ORIGINAL_ADDRESS.EXT_ID = EXTERNAL_ADDR.ID

This can significantly improve performance, assuming you have Enterprise Edition, sufficient resources, a sane configuration, etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM