如何比較Oracle和SQL服務器之間的大表（億行）數據

Question

我有一個進程填充一個有超過 1 億行的 oracl 表。 表結構如下

**ORACLE_TABLE**
id|contractdatetime|Attr3|Attr4|Attr5

(id,contractdatetime)的組合在此表中是唯一的，它使用外部進程填充。

總distinct id僅約為 30000。 每個 id 都有一個唯一的 contractdatetime。 id 不是唯一的，但(id,contractdatetime)的組合是

現在另一個進程在 SQL 服務器中填充了一個相同的表

**SQLSERVER_TABLE**
id|contractdatetime|Attr3|Attr4|Attr5

我正在考慮檢查數據是否兩個表相同的最佳方法。 我想我是否可以通過 contractid 獲得散列版本並以某種方式聚合 Oracle 中的所有其他屬性。 如果我可以在 SQL 服務器中做同樣的事情，我將能夠在 excel 本身（30000）行中進行比較。

我已經搜索了堆棧溢出，但無法獲得相同的 function 用於 MD5_XOR 或任何可以幫助實現此目的的內容，請參閱下面的鏈接。 http://www.db-nemec.com/MD5/CompareTablesUsingMD5Hash.html

使用鏈接服務器等的其他選項在獲得批准方面會更加困難。

有沒有一個關於 go 的好方法

Answer 1

對於 Oracle 和 SQL 服務器表之間的快速、高級比較，您可以使用函數STANDARD_HASH和HASH_BYTES的聚合。

Oracle 代碼

--Create a simple table.
create table table1
(
    id number,
    contractdatetime date,
    Attr3 varchar2(100),
    Attr4 varchar2(100),
    Attr5 varchar2(100)
);

--Insert 4 rows, the first three will be identical between databases,
--the last row will be different.
insert into table1 values (1, date '2000-01-01', 'a', 'a', 'a');
insert into table1 values (2, date '2000-01-01', 'b', 'b', 'b');
insert into table1 values (2, date '2000-01-02', null, null, null);
insert into table1 values (3, date '2000-01-02', 'Oracle', 'Oracle', 'Oracle');
commit;

select
    id,
    --Format the number
    trim(to_number(
        --Sum per group.
        sum(
            --Convert to a number.
            to_number(
                --Get the first 14 bytes. This seems to be the maximum that SQL Server can handle
                --before it runs into math errors.
                substr(
                    --Hash the value.
                    standard_hash(
                        --Concatenate the values using (hopefully) unique strings to separate the
                        --columns and represent NULLs (because the hashing functions treat nulls differently.)
                        nvl(to_char(contractdatetime, 'YYYY-MM-DD HH24:MI:SS'), 'null') || 
                        '-1-' || nvl(attr3, 'null') || '-2-' || nvl(attr3, 'null') || '-3-' || nvl(attr3, 'null')
                        , 'MD5')
                    , 1, 14)
                , 'xxxxxxxxxxxxxxxxxxxx'))
        , '99999999999999999999')) hash
from table1
group by id
order by 1;

SQL 服務器代碼

create table table1
(
    id numeric,
    contractdatetime datetime,
    Attr3 varchar(100),
    Attr4 varchar(100),
    Attr5 varchar(100)
);

insert into table1 values (1, cast('2000-01-01 00:00:00.000' as datetime), 'a', 'a', 'a');
insert into table1 values (2, cast('2000-01-01 00:00:00.000' as datetime), 'b', 'b', 'b');
insert into table1 values (2, cast('2000-01-02 00:00:00.000' as datetime), null, null, null);
insert into table1 values (3, cast('2000-01-02 00:00:00.000' as datetime), 'SQL Server', 'SQL Server', 'SQL Server');
commit;

select
    id,
    sum(
        convert(bigint, convert(varbinary, 
            substring(
                hashbytes('MD5',
                    isnull(convert(varchar(19), contractdatetime, 20), 'null') +
                    '-1-' + isnull(attr3, 'null') + '-2-' + isnull(attr3, 'null') + '-3-' + isnull(attr3, 'null'))
                , 1, 7)
            , 1))) hash
from table1
group by id
order by 1;

結果

正如預期的那樣，前兩組的哈希值相同，而第三組的 hash 不同。

Oracle:

ID  HASH
1   50696302970576522
2   69171702324546493
3   50787287321473273

SQL Server

ID  HASH
1   50696302970576522
2   69171702324546493
3   7440319042693061

這是一個Oracle 小提琴和一個SQL 服務器小提琴。

問題

我認為這個解決方案只有在數據庫使用相似的字符集時才有效，或者可能只使用前 127 個 ASCII 字符，這些字符通常在不同的字符集中編碼相同。
hash 碰撞的可能性很大（可能是不合理的）。 MD5 散列不足以防止加密攻擊，但它們足以比較數據集。 問題是我必須使用子字符串來讓數學適用於 SQL 服務器。 這可能是我沒有充分理解 SQL 服務器的錯 - BIGITS 應該支持大約 19 位的精度，但我的數學只能達到 14 位。 我可能在某個地方有一個轉換錯誤。 如果您遇到太多碰撞或溢出問題，您可能需要使用“14”和“7”數字。 （Oracle 為 14，根據顯示的十六進制字符計數。SQL 服務器為 7，根據每個十六進制字符可以表示的字符數計數，即 0.5。）

如何比較Oracle和SQL服務器之間的大表（億行）數據

問題描述

1 個解決方案

解決方案1
3 已采納 2021-01-24 21:11:04

Oracle 代碼

SQL 服務器代碼

結果

問題

如何比較Oracle和SQL服務器之間的大表（億行）數據

問題描述

1 個解決方案

解決方案1 3 已采納 2021-01-24 21:11:04

Oracle 代碼

SQL 服務器代碼

結果

問題

解決方案1
3 已采納 2021-01-24 21:11:04