簡體   English   中英

比較兩個沒有唯一鍵的表

[英]Comparing two tables that doesn't have unique key

我需要比較兩個表數據,並檢查哪個屬性不匹配,表具有相同的表定義,但是問題是我沒有唯一的鍵來比較。 我嘗試使用

CONCAT(CONCAT(CONCAT(table1.A, Table1.B))
=CONCAT(CONCAT(CONCAT(table2.A, Table2.B))

但仍然面對重復的行,也嘗試在少數列上使用NVL,但沒有用

SELECT  
    UT.cat,
    PD.cat
FROM 
    EM UT, EM_63 PD 
WHERE 
    NVL(UT.cat, 1) = NVL(PD.cat, 1) AND
    NVL(UT.AT_NUMBER, 1) = NVL(PD.AT_NUMBER, 1) AND
    NVL(UT.OFFSET, 1) = NVL(PD.OFFSET, 1) AND  
    NVL(UT.PROD, 1) = NVL(PD.PROD, 1)
;

一個表中有34k條記錄,另一表中有35k條記錄,但是如果運行上述查詢,則行數為300萬。

表中的列:

COUNTRY       
CATEGORY   
TYPE    
DESCRIPTION

樣本數據 :

表格1 :

COUNTRY  CATEGORY TYPE   DESCRIPTION       
US          C       T1      In
IN          A       T2      OUT
B           C       T2      IN
Y           C       T1      INOUT

表2:

COUNTRY  CATEGORY TYPE   DESCRIPTION    
US          C       T2      In
IN          B        T2     Out
Q           C       T2      IN

預期產量:

column      Matched  unmatched
COUNTRY         2       1
CATEGORY        2       1
TYPE            2       1
DESCRIPTION     3       0

在最一般的情況下(當您可能有重復的行,並且您想查看哪些表在一個表中存在而另一表中不存在,以及還希望哪些行在兩個表中都存在,但是該行在第一個表中存在3次)但另外5次):

這是一個固定的“最佳解決方案”的非常普遍的問題,盡管出於很多原因,它似乎仍未被大多數人了解,盡管它是在多年前在AskTom上開發的,並且已經被提出了無數次。

您不需要聯接,不需要任何類型的唯一鍵,也不需要多次讀取任何一個表。 想法是添加兩列以顯示每行來自哪個表,執行UNION ALL,然后除“ source”列之外的所有列都按GROUP BY並顯示每個表的計數。 像這樣:

select   count(t_1) as count_table_1, count(t_2) as count_table_2, col1, col2, ...
from     (
           select 'x' as t_1, null as t_2, col1, col2, ... 
             from table_1
           union all
           select null as t_1, 'x' as t_2, col1, col2, ...
             from table_2
         )
group by col1, col2, ...
having   count(t_1) != count(t_2)
;

從此查詢開始,檢查這4列是否構成鍵。

select      occ_total,occ_ut,occ_pd
           ,count(*)                as records

from       (select      count (*)                               as occ_total
                       ,count (case tab when 'UT' then 1 end)   as occ_ut
                       ,count (case tab when 'PD' then 1 end)   as occ_pd

            from                    select 'UT' as tab,cat,AT_NUMBER,OFFSET,PROD from EM
                        union all   select 'PD'       ,cat,AT_NUMBER,OFFSET,PROD from EM_63 PD
                        ) t

            group by    cat,AT_NUMBER,OFFSET,PROD
            ) t

group by    occ_total,occ_ut,occ_pd     

order by    records desc
;

選擇“鍵”后,可以使用以下查詢查看屬性的值

select      count (*)                               as occ_total
           ,count (case tab when 'UT' then 1 end)   as occ_ut
           ,count (case tab when 'PD' then 1 end)   as occ_pd

           ,count (distinct att1)                   as cnt_dst_att1
           ,count (distinct att2)                   as cnt_dst_att2
           ,count (distinct att3)                   as cnt_dst_att3
           ,...
           ,listagg (case tab when 'UT' then att1 end) within group (order by att1) as att1_vals_ut
           ,listagg (case tab when 'PD' then att1 end) within group (order by att1) as att1_vals_pd
           ,listagg (case tab when 'UT' then att2 end) within group (order by att2) as att2_vals_ut
           ,listagg (case tab when 'PD' then att2 end) within group (order by att2) as att2_vals_pd
           ,listagg (case tab when 'UT' then att3 end) within group (order by att3) as att3_vals_ut
           ,listagg (case tab when 'PD' then att3 end) within group (order by att3) as att3_vals_pd  
           ,...

from                    select 'UT' as tab,cat,AT_NUMBER,OFFSET,PROD,att1,att2,att3,... from E M
            union all   select 'PD'       ,cat,AT_NUMBER,OFFSET,PROD,att1,att2,att3,... from EM_63 PD
            ) t

group by    cat,AT_NUMBER,OFFSET,PROD
;

CONCAT的問題是,如果您的數據看起來像這樣,則可能會得到無效的匹配項:

table1.A = '123'
table1.B = '456'

串聯為: '123456'

table2.A = '12'
table2.B = '3456'

也串聯為: '123456'

您必須分別比較字段: table1.A = table2.A AND table1.B = table2.B

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM