简体   繁体   中英

Update million rows using rowids from one table to another Oracle

Hi I have two table with million rows in each.I have oracle 11 g R1 I am sure many of us must have gone through this situation.

What is the most efficient and fast way to update from one table to another where the values are DIFFERENT.

Eg: Table 1 has 4 NUMBER columns with a high precision eg : 0.2212454215454212

Table 2 has 6 columns. update table 2's four columns based on common column on both the tables, only the different ones.

I have something like this

DECLARE
TYPE test1_t IS TABLE OF test.score%TYPE INDEX BY PLS_..;
TYPE test2_t IS TABLE OF test.id%TYPE INDEX BY PLS..; 
TYPE test3_t IS TABLE OF test.Crank%TYPE INDEX BY PLS..;

vscore test1_t;
vid test2_t;
vurank test4_t;

BEGIN
  SELECT id,score,urank
    BULK COLLECT INTO vid,vscore,vurank
    FROM test;

  FORALL i IN 1 .. vid.COUNT
    MERGE INTO final T
      USING (SELECT vid (i) AS o_id,
                    vurank (i) AS o_urank,
                    vscore (i) AS o_score FROM DUAL) S
      ON (S.o_id = T.id)
    WHEN MATCHED THEN
      UPDATE SET T.crank = S.o_crank
      WHERE T.crank <> S.o_crank;

Since the numbers are with high precision is it slowing down?

I tried Bulk Collect and Merge combination still its taking time ~ 30 mins for worst case scenario if I have to update 1 million rows.

Is there something with rowid? Help will be appreciated.

If you want to update all the rows, then just use update:

update table_1
set    (col1,
        col2) = (
         select col1,
                col2
         from   table2
         where  table2.col_a = table1.col_a and
                table2.col_b = table1.col_b)

Bulk collect or any PL/SQL technique will always be slower than a pure SQL technique.

The numeric precision is probably not significant, and rowid is not relevant as there is no common value between the two tables.

When dealing with millions of rows, parallel DML is a game changer. Of course you need to have Enterprise Edition to use parallel, but it's really the only thing which will make much difference.

I recommend you read an article on OraFAQ by rleishman comparing 8 Bulk Update Methods . His key finding is that "the cost of disk reads so far outweighs the context switches that that they are barely noticable (sic)". In other words, unless your data is already cached in memory there really isn't a significant difference between SQL and PL/SQL approaches.

The article does have some neat suggestions on employing parallel. The surprising outcome is that a parallel pipelined function offers the best performance.

Focusing on the syntax have been used and skipping the logic (may using a pure update + pure insert may solve the problem, merge cost, indexes, possible full scan on merge and else )
You should use Limit in Bulk Collect syntax
Using a bulk collect with no limit

  1. Will case all records to be loaded in memory
  2. With no partially committed merges, you will create a larg redolog, that must be apply in the end of the process.

Both will reason in low performance.

DECLARE
 v_fetchSize NUMBER := 1000; -- based on hardware, design and .... could be scaled
 CURSOR a_cur IS 
 SELECT id,score,urank FROM test;    
 TYPE myarray IS TABLE OF a_cur%ROWTYPE;
 cur_array myarray;

    BEGIN
      OPEN a_cur;
      LOOP
        FETCH a_cur BULK COLLECT INTO cur_array LIMIT v_fetchSize;
          FORALL i IN 1 .. cur_array.COUNT
          // DO Operation
          COMMIT;
        EXIT WHEN a_cur%NOTFOUND;
      END LOOP;
      CLOSE a_cur;
    END;
  1. Just to be sure: test.id and final.id must be indexed.

  2. With first select ... from test you got too much records from Table 1 and after that you need to compare all of them with records on Table 2 . Try to select only what you need to update. So, there are at least 2 variants:

a) select only changed records:

  SELECT source_table.id, source_table.score, source_table.urank 
  BULK COLLECT INTO vid,vscore,vurank 
  FROM 
    test source_table, 
    final destination_table
  where 
    source_table.id = destination_table.id 
    and
    source_table.crank <> destination_table.crank
   ;

b) Add new field to source table with datetime value and fill it in trigger with current time. While synchronizing pick only records changed during last day. This field needs to be indexed.

After such a change on update phase you don't need to compare other fields, only match ID's:

  FORALL i IN 1 .. vid.COUNT 
  MERGE INTO FINAL T 
  USING (
    SELECT vid (i) AS o_id,
           vurank (i) AS o_urank,
           vscore (i) AS o_score FROM DUAL
  ) S 
  ON (S.o_id = T.id) 
  WHEN MATCHED 
  THEN UPDATE SET T.crank = S.o_crank 

If you worry about size of undo/redo segments then variant b) is more useful, because you can get records from source Table 1 divided to time slices and commit changes after updating every slice. Eg from 00:00 to 01:00 , from 01:00 to 02:00 etc. In this variant update can be done just by SQL statement without selecting a data into collections in row with maintaining acceptable sizes of redo/undo logs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM