繁体   English   中英

更好的设计,消除联合查询后的重复项

[英]Better Design for eliminate duplicates after union query

我正在设计一个UNION查询,以将两个表与客户信息合并到oracle 11g数据库中。 第一个表a是“主要”源,第二个表b是具有条目和重复条目的附加源。

实际上,通过使用UNION不能消除b中的重复项,因为它们具有不相等的字段,例如必须选择的自动递增ID。

ID CUSTOMER_NUMBER NAME STREET 1 4711 Dirk Downstreet 4 2 4721 Hans Mainstreet 5

b

ID CUSTOMER_NUMBER NAME STREET 44 4711 Dirk Downstreet 4 <== Duplicate 4 4741 Harry Crossroad 9 <== new

预期结果

ID CUSTOMER_NUMBER NAME STREET DATASOURCE 1 4711 Dirk Downstreet 4 SAP <== from a 2 4721 Hans Mainstreet 5 SAP <== from a 4 4741 Harry Crossroad 9 MANUAL <== from b

我对以下简化测试感到满意:

SELECT CUSTOMER_NUMBER, 
    MAX(ID) KEEP (DENSE_RANK FIRST ORDER BY DATASOURCE DESC) ID,
    MAX(NAME) KEEP (DENSE_RANK FIRST ORDER BY DATASOURCE DESC) NAME,
    MAX(STREET) KEEP (DENSE_RANK FIRST ORDER BY DATASOURCE DESC) STREET,
FROM 
    (SELECT "ID","CUSTOMER_NUMBER","NAME","STREET", 'SAP' as DATASOURCE FROM CUSTOMERS
        UNION ALL
    SELECT "ID","CUSTOMER_NUMBER","NAME","STREET", 'MANUAL' as DATASOURCE FROM CUSTOMERS_MANUAL) united
group by CUSTOMER_NUMBER

但是我必须通过DENSE_RANK FIRST ORDER BY DATASOURCE DESC选择每个字段,这大约是20个字段...

谁能告诉我更好的方法?

对于每列, KEEP的替代方法是使用ROW_NUMBER ,并按唯一键和正确的顺序进行分区,并仅选择编号为1的行。

CUSTOMER_NUMBER作为唯一键的示例,相对于SAP首选MANUAL ,并且期望ID在每个来源中都是唯一的:

SELECT * FROM 
(
SELECT 
   "ID","CUSTOMER_NUMBER","NAME","STREET",
   roww_number() over (partition by CUSTOMER_NUMBER order by decode(DATASOURCE,'SAP',2,'MANUAL',1), ID) as RN
FROM 
    (SELECT   "ID","CUSTOMER_NUMBER","NAME","STREET", 'SAP' as DATASOURCE FROM CUSTOMERS
        UNION ALL
     SELECT   "ID","CUSTOMER_NUMBER","NAME","STREET", 'MANUAL' as DATASOURCE FROM CUSTOMERS_MANUAL) united
) WHERE RN = 1

即使个别来源提供重复副本,此方法也可以正常工作。 调整顺序列,以便查询保持确定性,即重复查询提供相同的结果(例如,如果ID列可以在SAP重复,则添加NAME

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM