[英]Delete and Merge Records in SQL Server
我有一張桌子,如下。
id | firstname| lastname | email | homephone
-------------------------------------------------------
1 | aaa | bbb | xxx@yyy.com | 12344444
2 | aaa | bbb | null | null
3 | ccc | ddd | zzz@fff.com | null
4 | ccc | ddd | null | 34343322
問題是我只想保留1條記錄,因為這些記錄被視為重復記錄並合並了null,因此該表如下所示
1 aaa | bbb | xxx@yyy.com | 12344444
3 ccc | ddd | zzz@fff.com | 3433322
到目前為止,我已經設法使用以下代碼獲取重復項
Select
max(a.id) as original id, b.id as DuplicateId,
a.firstname, b.firstname as dup_fname,
a.lastname, b.lastname as dup_lname,
a.email, b.email
From
tbl_xxx a
join
tbl_xxx b on a.firstname = b.firstname
and a.lastname = b.lastname
and a.email is null
and a.homephone is null
and b.email is null
and b.homephone is null
and v.id < v2.id
Group by
b.id, a.firstname, b.firstname, a.lastname,
b.lastname, a.homephone, b.homephone
我的合並查詢如下所示
update tbl_xxx
SET
email = email ,
phone = phone
where
firstname = firstname
and lastname = lastname
and email is null
and phone is null
最終,我將得到不同的行。
我的方法正確嗎? 請提出如何提高查詢效率的建議
update tbl_tmpdupes3 SET
email = email ,
phone = phone ,
where
firstname=firstname
and lastname=lastname
and email is null
and homephone is null
絕對不會做任何事情,因為查詢不是將表與自身進行比較,而是將每一行與自身進行比較。 使用更新也不起作用,您仍然會有重復項。 您想要的是完全刪除重復數據,然后將表與其自身進行更新。 因此,基本上,我們運行一個查詢,以確保將所有不為null的重復信息都復制到原始信息中,然后刪除較高的值pk。
解決問題的一種方法是一次更新一列:
update tbl_xxx SET tbl_xxx.email = tmp.email
FROM (SELECT tbl_xxx.firstname,tbl_xxx.lastname,tbl_xxx.email FROM tbl_xxx
WHERE NOT tbl_xxx.email IS NULL LIMIT 1)
AS tmp ON tmp.firstname = tbl_xxx.firstname AND tmp.lastname = tbl_xxx.lastname
WHERE tbl_xxx.email IS NULL;
update tbl_xxx SET tbl_xxx.phone = tmp.phone
FROM (SELECT tbl_xxx.firstname,tbl_xxx.lastname,tbl_xxx.phone FROM tbl_xxx
WHERE NOT tbl_xxx.phone IS NULL LIMIT 1)
AS tmp ON tmp.firstname = tbl_xxx.firstname AND tmp.lastname = tbl_xxx.lastname
WHERE tbl_xxx.phone is NULL;
該查詢將查找每個列的名字和姓氏值,並將其找到的第一個值復制到空字段中。 因此,如果原始數據丟失,它將添加它。 如果數據庫中的兩個不同的人具有相同的名稱,可能不是100%正確,您必須考慮到這一點。
也就是說,請繼續執行此查詢,該查詢僅應刪除相同的high-pk行。
DELETE FROM tbl_xxx WHERE tbl_xxx.id IN (
SELECT max(id) FROM tbl_xxx
GROUP BY tbl_xxx.firstname,tbl_xxx.lastname,tbl_xxx.phone,tbl_xxx.email
HAVING count(tbl_xxx.id) > 1));
編輯:如果可能有多個重復項,則可以執行以下操作:
DELETE FROM tbl_xxx WHERE tbl_xxx.id NOT IN (
SELECT min(id) FROM tbl_xxx
GROUP BY tbl_xxx.firstname,tbl_xxx.lastname,tbl_xxx.phone,tbl_xxx.email);
您可以為此使用Merge語句。 試試這個樣本。
create table temptable (id int, firstname varchar(50), lastname varchar(50), email varchar(50), homephone varchar(50))
insert into temptable values
(1,'aaa' , 'bbb', 'xxx@yyy.com', 1234444),
(2,'aaa' , 'bbb', null, null),
(3,'ccc' , 'ddd', 'abc@ddey.com', null),
(4,'ccc' , 'ddd', null, 34343322 )
select * from temptable
;with cte as
(
select firstname, lastname
,(select top 1 id from temptable b where b.firstname = a.firstname and b.lastname = a.lastname and ( b.email is not null or b.homephone is not null)) tid
,(select top 1 email from temptable b where b.firstname = a.firstname and b.lastname = a.lastname and b.email is not null ) email
,(select top 1 homephone from temptable b where b.firstname = a.firstname and b.lastname = a.lastname and b.homephone is not null ) homephone
from temptable a
group by firstname , lastname
)
--select * from cte
merge temptable as a
using cte as b
on ( a.id = b.tid )
when matched
then
update set a.email = b.email , a.homephone = b.homephone
when not matched by source then
delete ;
select * from temptable
drop table temptable
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.