簡體   English   中英

刪除和合並SQL Server中的記錄

[英]Delete and Merge Records in SQL Server

我有一張桌子,如下。

id | firstname|  lastname |  email       |   homephone
-------------------------------------------------------
1  | aaa      | bbb       | xxx@yyy.com  |  12344444    
2  | aaa      | bbb       | null         |  null    
3  | ccc      | ddd       | zzz@fff.com  | null  
4  | ccc      | ddd       | null         | 34343322  

問題是我只想保留1條記錄,因為這些記錄被視為重復記錄並合並了null,因此該表如下所示

1 aaa | bbb | xxx@yyy.com | 12344444  
3 ccc | ddd | zzz@fff.com | 3433322

到目前為止,我已經設法使用以下代碼獲取重復項

Select 
   max(a.id) as original id, b.id as DuplicateId, 
   a.firstname, b.firstname as dup_fname,
   a.lastname, b.lastname as dup_lname,
   a.email, b.email 
From 
   tbl_xxx a 
join 
   tbl_xxx b on a.firstname = b.firstname
             and a.lastname = b.lastname
             and a.email is null
             and a.homephone is null
             and b.email is null
             and b.homephone is null
             and v.id < v2.id   
Group by 
   b.id, a.firstname, b.firstname, a.lastname,
   b.lastname, a.homephone, b.homephone

我的合並查詢如下所示

update tbl_xxx 
SET
    email = email , 
    phone = phone  
where 
    firstname = firstname
    and lastname = lastname 
    and email is null 
    and phone is null 

最終,我將得到不同的行。

我的方法正確嗎? 請提出如何提高查詢效率的建議

update tbl_tmpdupes3 SET
email = email , 
phone = phone , 
where 
firstname=firstname
and lastname=lastname 
and email is null 
and homephone is null

絕對不會做任何事情,因為查詢不是將表與自身進行比較,而是將每一行與自身進行比較。 使用更新也不起作用,您仍然會有重復項。 您想要的是完全刪除重復數據,然后將表與其自身進行更新。 因此,基本上,我們運行一個查詢,以確保將所有不為null的重復信息都復制到原始信息中,然后刪除較高的值pk。

解決問題的一種方法是一次更新一列:

update tbl_xxx SET tbl_xxx.email = tmp.email 
FROM (SELECT tbl_xxx.firstname,tbl_xxx.lastname,tbl_xxx.email FROM tbl_xxx
     WHERE NOT tbl_xxx.email IS NULL LIMIT 1) 
AS tmp ON tmp.firstname = tbl_xxx.firstname AND tmp.lastname = tbl_xxx.lastname
WHERE tbl_xxx.email IS NULL;

update tbl_xxx SET tbl_xxx.phone = tmp.phone 
FROM (SELECT tbl_xxx.firstname,tbl_xxx.lastname,tbl_xxx.phone FROM tbl_xxx
     WHERE NOT tbl_xxx.phone IS NULL LIMIT 1) 
AS tmp ON tmp.firstname = tbl_xxx.firstname AND tmp.lastname = tbl_xxx.lastname
WHERE tbl_xxx.phone is NULL;

該查詢將查找每個列的名字和姓氏值,並將其找到的第一個值復制到空字段中。 因此,如果原始數據丟失,它將添加它。 如果數據庫中的兩個不同的人具有相同的名稱,可能不是100%正確,您必須考慮到這一點。

也就是說,請繼續執行此查詢,該查詢僅應刪除相同的high-pk行。

DELETE FROM tbl_xxx WHERE tbl_xxx.id IN (
    SELECT max(id) FROM tbl_xxx 
    GROUP BY tbl_xxx.firstname,tbl_xxx.lastname,tbl_xxx.phone,tbl_xxx.email
    HAVING count(tbl_xxx.id) > 1));

編輯:如果可能有多個重復項,則可以執行以下操作:

DELETE FROM tbl_xxx WHERE tbl_xxx.id NOT IN (
    SELECT min(id) FROM tbl_xxx
    GROUP BY tbl_xxx.firstname,tbl_xxx.lastname,tbl_xxx.phone,tbl_xxx.email);

您可以為此使用Merge語句。 試試這個樣本。

create table temptable (id int, firstname varchar(50),  lastname varchar(50),  email       varchar(50),   homephone varchar(50))

insert into temptable values
(1,'aaa' , 'bbb', 'xxx@yyy.com', 1234444),
(2,'aaa' , 'bbb', null, null),
(3,'ccc' , 'ddd', 'abc@ddey.com', null),
(4,'ccc' , 'ddd', null, 34343322  )

select * from temptable

;with cte as
(
    select firstname, lastname
    ,(select top 1 id from temptable b where b.firstname = a.firstname and b.lastname = a.lastname and ( b.email is not null or b.homephone is not null)) tid
    ,(select top 1 email from temptable b where b.firstname = a.firstname and b.lastname = a.lastname and b.email is not null )  email
    ,(select top 1 homephone from temptable b where b.firstname = a.firstname and b.lastname = a.lastname and b.homephone is not null ) homephone
    from temptable a
    group by firstname , lastname
)

--select * from cte
merge temptable as a
using cte as b
on ( a.id = b.tid )
when matched 
    then 
    update set a.email = b.email , a.homephone = b.homephone    
when not matched by source then
    delete ;

select * from temptable

drop table temptable

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM