[英]Better Design for eliminate duplicates after union query
我正在设计一个UNION查询,以将两个表与客户信息合并到oracle 11g数据库中。 第一个表a是“主要”源,第二个表b是具有新条目和重复条目的附加源。
实际上,通过使用UNION不能消除b中的重复项,因为它们具有不相等的字段,例如必须选择的自动递增ID。
表一
ID CUSTOMER_NUMBER NAME STREET 1 4711 Dirk Downstreet 4 2 4721 Hans Mainstreet 5
表b
ID CUSTOMER_NUMBER NAME STREET 44 4711 Dirk Downstreet 4 <== Duplicate 4 4741 Harry Crossroad 9 <== new
预期结果
ID CUSTOMER_NUMBER NAME STREET DATASOURCE 1 4711 Dirk Downstreet 4 SAP <== from a 2 4721 Hans Mainstreet 5 SAP <== from a 4 4741 Harry Crossroad 9 MANUAL <== from b
我对以下简化测试感到满意:
SELECT CUSTOMER_NUMBER,
MAX(ID) KEEP (DENSE_RANK FIRST ORDER BY DATASOURCE DESC) ID,
MAX(NAME) KEEP (DENSE_RANK FIRST ORDER BY DATASOURCE DESC) NAME,
MAX(STREET) KEEP (DENSE_RANK FIRST ORDER BY DATASOURCE DESC) STREET,
FROM
(SELECT "ID","CUSTOMER_NUMBER","NAME","STREET", 'SAP' as DATASOURCE FROM CUSTOMERS
UNION ALL
SELECT "ID","CUSTOMER_NUMBER","NAME","STREET", 'MANUAL' as DATASOURCE FROM CUSTOMERS_MANUAL) united
group by CUSTOMER_NUMBER
但是我必须通过DENSE_RANK FIRST ORDER BY DATASOURCE DESC选择每个字段,这大约是20个字段...
谁能告诉我更好的方法?
对于每列, KEEP
的替代方法是使用ROW_NUMBER
,并按唯一键和正确的顺序进行分区,并仅选择编号为1的行。
CUSTOMER_NUMBER
作为唯一键的示例,相对于SAP
首选MANUAL
,并且期望ID
在每个来源中都是唯一的:
SELECT * FROM
(
SELECT
"ID","CUSTOMER_NUMBER","NAME","STREET",
roww_number() over (partition by CUSTOMER_NUMBER order by decode(DATASOURCE,'SAP',2,'MANUAL',1), ID) as RN
FROM
(SELECT "ID","CUSTOMER_NUMBER","NAME","STREET", 'SAP' as DATASOURCE FROM CUSTOMERS
UNION ALL
SELECT "ID","CUSTOMER_NUMBER","NAME","STREET", 'MANUAL' as DATASOURCE FROM CUSTOMERS_MANUAL) united
) WHERE RN = 1
即使个别来源提供重复副本,此方法也可以正常工作。 调整顺序列,以便查询保持确定性,即重复查询提供相同的结果(例如,如果ID
列可以在SAP
重复,则添加NAME
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.