简体   繁体   English

使用Rowid从一张表更新百万行到另一张Oracle

[英]Update million rows using rowids from one table to another Oracle

Hi I have two table with million rows in each.I have oracle 11 g R1 I am sure many of us must have gone through this situation. 嗨,我有两个表,每个表都有一百万行。我有oracle 11 g R1,我敢肯定我们中的许多人肯定已经经历了这种情况。

What is the most efficient and fast way to update from one table to another where the values are DIFFERENT. 什么是从一个表更新到值不同的另一个表的最有效,最快捷的方法。

Eg: Table 1 has 4 NUMBER columns with a high precision eg : 0.2212454215454212 例如:表1具有4个高精度的NUMBER列,例如:0.2212454215454212

Table 2 has 6 columns. 表2有6列。 update table 2's four columns based on common column on both the tables, only the different ones. 根据两个表上的公共列更新表2的四个列,仅更新不同的列。

I have something like this 我有这样的东西

DECLARE
TYPE test1_t IS TABLE OF test.score%TYPE INDEX BY PLS_..;
TYPE test2_t IS TABLE OF test.id%TYPE INDEX BY PLS..; 
TYPE test3_t IS TABLE OF test.Crank%TYPE INDEX BY PLS..;

vscore test1_t;
vid test2_t;
vurank test4_t;

BEGIN
  SELECT id,score,urank
    BULK COLLECT INTO vid,vscore,vurank
    FROM test;

  FORALL i IN 1 .. vid.COUNT
    MERGE INTO final T
      USING (SELECT vid (i) AS o_id,
                    vurank (i) AS o_urank,
                    vscore (i) AS o_score FROM DUAL) S
      ON (S.o_id = T.id)
    WHEN MATCHED THEN
      UPDATE SET T.crank = S.o_crank
      WHERE T.crank <> S.o_crank;

Since the numbers are with high precision is it slowing down? 由于数字的精度很高,它会变慢吗?

I tried Bulk Collect and Merge combination still its taking time ~ 30 mins for worst case scenario if I have to update 1 million rows. 如果必须更新100万行,在最坏的情况下,我仍然尝试使用“批量收集和合并”组合花费大约30分钟的时间。

Is there something with rowid? 有rowid吗? Help will be appreciated. 帮助将不胜感激。

If you want to update all the rows, then just use update: 如果要更新所有行,则只需使用update:

update table_1
set    (col1,
        col2) = (
         select col1,
                col2
         from   table2
         where  table2.col_a = table1.col_a and
                table2.col_b = table1.col_b)

Bulk collect or any PL/SQL technique will always be slower than a pure SQL technique. 批量收集或任何PL / SQL技术总是比纯SQL技术要慢。

The numeric precision is probably not significant, and rowid is not relevant as there is no common value between the two tables. 数值精度可能并不重要,并且rowid不相关,因为两个表之间没有通用值。

When dealing with millions of rows, parallel DML is a game changer. 当处理数百万行时,并行DML会改变游戏规则。 Of course you need to have Enterprise Edition to use parallel, but it's really the only thing which will make much difference. 当然,您需要具有企业版才能使用并行,但这实际上是唯一会带来很大变化的东西。

I recommend you read an article on OraFAQ by rleishman comparing 8 Bulk Update Methods . 我建议您阅读rleishman比较8 Bulk Update Methods的有关OraFAQ的文章。 His key finding is that "the cost of disk reads so far outweighs the context switches that that they are barely noticable (sic)". 他的主要发现是“到目前为止,磁盘读取的成本远远超过了上下文切换,因此几乎看不到(原文如此)”。 In other words, unless your data is already cached in memory there really isn't a significant difference between SQL and PL/SQL approaches. 换句话说,除非您的数据已经缓存在内存中,否则SQL和PL / SQL方法之间确实没有显着差异。

The article does have some neat suggestions on employing parallel. 本文确实对并行应用提出了一些巧妙的建议。 The surprising outcome is that a parallel pipelined function offers the best performance. 令人惊讶的结果是并行流水线功能可提供最佳性能。

Focusing on the syntax have been used and skipping the logic (may using a pure update + pure insert may solve the problem, merge cost, indexes, possible full scan on merge and else ) 专注于已使用的语法并跳过逻辑 (可能使用纯更新+纯插入可能会解决问题,合并成本,索引,可能对合并进行全面扫描等)
You should use Limit in Bulk Collect syntax 您应该在Bulk Collect语法中使用Limit
Using a bulk collect with no limit 无限制使用批量收集

  1. Will case all records to be loaded in memory 将所有记录都装入内存
  2. With no partially committed merges, you will create a larg redolog, that must be apply in the end of the process. 没有部分提交的合并,您将创建一个大的重做日志,该重做日志必须在流程结束时应用。

Both will reason in low performance. 两者都会导致性能降低。

DECLARE
 v_fetchSize NUMBER := 1000; -- based on hardware, design and .... could be scaled
 CURSOR a_cur IS 
 SELECT id,score,urank FROM test;    
 TYPE myarray IS TABLE OF a_cur%ROWTYPE;
 cur_array myarray;

    BEGIN
      OPEN a_cur;
      LOOP
        FETCH a_cur BULK COLLECT INTO cur_array LIMIT v_fetchSize;
          FORALL i IN 1 .. cur_array.COUNT
          // DO Operation
          COMMIT;
        EXIT WHEN a_cur%NOTFOUND;
      END LOOP;
      CLOSE a_cur;
    END;
  1. Just to be sure: test.id and final.id must be indexed. 只是要确保:必须对test.idfinal.id进行索引。

  2. With first select ... from test you got too much records from Table 1 and after that you need to compare all of them with records on Table 2 . 第一次select ... from test您从Table 1获得了太多记录,然后需要将所有这些与Table 2记录进行比较。 Try to select only what you need to update. 尝试仅选择您需要更新的内容。 So, there are at least 2 variants: 因此,至少有2个变体:

a) select only changed records: a)仅选择更改的记录:

  SELECT source_table.id, source_table.score, source_table.urank 
  BULK COLLECT INTO vid,vscore,vurank 
  FROM 
    test source_table, 
    final destination_table
  where 
    source_table.id = destination_table.id 
    and
    source_table.crank <> destination_table.crank
   ;

b) Add new field to source table with datetime value and fill it in trigger with current time. b)使用日期时间值将新字段添加到源表中,并在触发器中填充当前时间。 While synchronizing pick only records changed during last day. 在同步选择时,仅记录在过去一天中发生了更改。 This field needs to be indexed. 该字段需要索引。

After such a change on update phase you don't need to compare other fields, only match ID's: 在更新阶段进行了此类更改之后,您无需比较其他字段,只需匹配ID:

  FORALL i IN 1 .. vid.COUNT 
  MERGE INTO FINAL T 
  USING (
    SELECT vid (i) AS o_id,
           vurank (i) AS o_urank,
           vscore (i) AS o_score FROM DUAL
  ) S 
  ON (S.o_id = T.id) 
  WHEN MATCHED 
  THEN UPDATE SET T.crank = S.o_crank 

If you worry about size of undo/redo segments then variant b) is more useful, because you can get records from source Table 1 divided to time slices and commit changes after updating every slice. 如果您担心撤消/重做段的大小,则变体b)更为有用,因为您可以从源Table 1获取记录并划分为时间片,并在更新每个片之后提交更改。 Eg from 00:00 to 01:00 , from 01:00 to 02:00 etc. In this variant update can be done just by SQL statement without selecting a data into collections in row with maintaining acceptable sizes of redo/undo logs. 例如,从00:00到01:00,从01:00到02:00等。在此变体中,只需通过SQL语句即可完成更新,而不必在保持可接受的重做/撤消日志大小的情况下,将数据选择到行的集合中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM