简体   繁体   English

使用复合键在 Oracle 表中进行 20 亿条记录的增量检测

[英]Delta Detection in Oracle table with 2 billion records using composite key

Initial Load on Day 1第 1 天的初始负载

id ID key钥匙 fkid fkid
1 1个 0 0 100 100
1 1个 1 1个 200 200
2 2个 0 0 300 300

Load on Day 2第 2 天加载

id ID key钥匙 fkid fkid
1 1个 0 0 100 100
1 1个 1 1个 200 200
2 2个 0 0 300 300
3 3个 1 1个 400 400
4 4个 0 0 500 500

Need to find delta records Load on Day 2需要查找第 2 天加载的增量记录

id ID key钥匙 address地址
3 3个 1 1个 400 400
4 4个 0 0 500 500

Problem Statement Need to find delta records in minimum time with following facts 1: I have to process around 2 billion records initially from a table as mentioned below 2: Also need to find delta with minimal time so that I can process it quickly问题陈述需要在最短时间内找到增量记录,并说明以下事实 1:我最初必须从表中处理大约 20 亿条记录,如下所述 2:还需要在最短时间内找到增量,以便我可以快速处理

Questions : 1: Will it be a time consuming process to identify delta especially during production downtime?问题: 1:识别 delta 是否是一个耗时的过程,尤其是在生产停机期间? 2: How long should it take to identify delta with 3 numeric columns in a table out of which id & key forms a composite key. 2:识别表中具有 3 个数字列的增量需要多长时间,其中 id 和键 forms 是一个复合键。

Solution tried : 1: Use full join and extract delta with case nvl condition but looks to be costly.尝试过的解决方案:1:在 case nvl 条件下使用完全连接和提取增量,但看起来成本很高。

    nvl(node1.id, node2.id) id,
    nvl(node1.key, node2.key) key,
    nvl(node1.fkid, node2.fkid) fkid
FROM
    TABLE_DAY_1       node1
    FULL JOIN TABLE_DAY_2   node2 ON node2.id = node1.id
WHERE
    node2.id IS NULL
    OR node1.id IS NULL;```

You need two separate statements to handle this, one to detect new & changed rows, a separate one to detect deleted rows.您需要两个单独的语句来处理此问题,一个用于检测新行和更改的行,另一个用于检测已删除的行。

While it is cumberson to write, the fastest comparison is field-by-field, so:虽然写起来很麻烦,但最快的比较是逐个字段进行比较,因此:

SELECT /*+ parallel(8) full(node1) full(node2) USE_HASH(node1 node) */ *
  FROM table_day_1 node1,
       table_day_2 node2
 WHERE node1.id = node2.id(+)
   AND (node2.id IS NULL -- new rows
        OR node1.col1 <> node2.col2 -- changed val on non-nullable col
        OR NVL(node1.col3,' ') <> NVL(node2.col3,' ') -- changed val on nullable string
        OR NVL(node1.col4,-1) <> NVL(node2.col4,-1) -- changed val on nullable numeric, etc..
       )

Then for deleted rows:然后对于删除的行:

SELECT /*+ parallel(8) full(node1) full(node2) USE_HASH(node1 node) */ node2.id
  FROM table_day_1 node1,
       table_day_2 node2
 WHERE node1.id(+) = node2.id
   AND node1.id IS NULL -- deleted rows

You will want to make sure Oracle does a full table scan.您需要确保 Oracle 进行全表扫描。 If you have lots of CPUs and parallel query is enabled on your database, make sure the query uses parallel query (hence the hint).如果您有很多 CPU 并且在您的数据库上启用了并行查询,请确保查询使用并行查询(因此提示)。 And you want a hash join between them.并且您希望在它们之间加入 hash。 Work with your DBA to ensure you have enough temporary space to pull this off, and enough PGA to at least handle this with a single pass workarea rather than multipass.与您的 DBA 合作以确保您有足够的临时空间来完成此操作,并且有足够的 PGA 来至少使用单通道工作区而不是多通道来处理此问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM