刪除和合並SQL Server中的記錄

Question

我有一張桌子，如下。

id | firstname|  lastname |  email       |   homephone
-------------------------------------------------------
1  | aaa      | bbb       | xxx@yyy.com  |  12344444    
2  | aaa      | bbb       | null         |  null    
3  | ccc      | ddd       | zzz@fff.com  | null  
4  | ccc      | ddd       | null         | 34343322

問題是我只想保留1條記錄，因為這些記錄被視為重復記錄並合並了null，因此該表如下所示

1 aaa | bbb | xxx@yyy.com | 12344444  
3 ccc | ddd | zzz@fff.com | 3433322

到目前為止，我已經設法使用以下代碼獲取重復項

Select 
   max(a.id) as original id, b.id as DuplicateId, 
   a.firstname, b.firstname as dup_fname,
   a.lastname, b.lastname as dup_lname,
   a.email, b.email 
From 
   tbl_xxx a 
join 
   tbl_xxx b on a.firstname = b.firstname
             and a.lastname = b.lastname
             and a.email is null
             and a.homephone is null
             and b.email is null
             and b.homephone is null
             and v.id < v2.id   
Group by 
   b.id, a.firstname, b.firstname, a.lastname,
   b.lastname, a.homephone, b.homephone

我的合並查詢如下所示

update tbl_xxx 
SET
    email = email , 
    phone = phone  
where 
    firstname = firstname
    and lastname = lastname 
    and email is null 
    and phone is null

最終，我將得到不同的行。

我的方法正確嗎？ 請提出如何提高查詢效率的建議

Answer 1

update tbl_tmpdupes3 SET
email = email , 
phone = phone , 
where 
firstname=firstname
and lastname=lastname 
and email is null 
and homephone is null

絕對不會做任何事情，因為查詢不是將表與自身進行比較，而是將每一行與自身進行比較。 使用更新也不起作用，您仍然會有重復項。 您想要的是完全刪除重復數據，然后將表與其自身進行更新。 因此，基本上，我們運行一個查詢，以確保將所有不為null的重復信息都復制到原始信息中，然后刪除較高的值pk。

解決問題的一種方法是一次更新一列：

update tbl_xxx SET tbl_xxx.email = tmp.email 
FROM (SELECT tbl_xxx.firstname,tbl_xxx.lastname,tbl_xxx.email FROM tbl_xxx
     WHERE NOT tbl_xxx.email IS NULL LIMIT 1) 
AS tmp ON tmp.firstname = tbl_xxx.firstname AND tmp.lastname = tbl_xxx.lastname
WHERE tbl_xxx.email IS NULL;

update tbl_xxx SET tbl_xxx.phone = tmp.phone 
FROM (SELECT tbl_xxx.firstname,tbl_xxx.lastname,tbl_xxx.phone FROM tbl_xxx
     WHERE NOT tbl_xxx.phone IS NULL LIMIT 1) 
AS tmp ON tmp.firstname = tbl_xxx.firstname AND tmp.lastname = tbl_xxx.lastname
WHERE tbl_xxx.phone is NULL;

該查詢將查找每個列的名字和姓氏值，並將其找到的第一個值復制到空字段中。 因此，如果原始數據丟失，它將添加它。 如果數據庫中的兩個不同的人具有相同的名稱，可能不是100％正確，您必須考慮到這一點。

也就是說，請繼續執行此查詢，該查詢僅應刪除相同的high-pk行。

DELETE FROM tbl_xxx WHERE tbl_xxx.id IN (
    SELECT max(id) FROM tbl_xxx 
    GROUP BY tbl_xxx.firstname,tbl_xxx.lastname,tbl_xxx.phone,tbl_xxx.email
    HAVING count(tbl_xxx.id) > 1));

編輯：如果可能有多個重復項，則可以執行以下操作：

DELETE FROM tbl_xxx WHERE tbl_xxx.id NOT IN (
    SELECT min(id) FROM tbl_xxx
    GROUP BY tbl_xxx.firstname,tbl_xxx.lastname,tbl_xxx.phone,tbl_xxx.email);

Answer 2

您可以為此使用Merge語句。 試試這個樣本。

create table temptable (id int, firstname varchar(50),  lastname varchar(50),  email       varchar(50),   homephone varchar(50))

insert into temptable values
(1,'aaa' , 'bbb', 'xxx@yyy.com', 1234444),
(2,'aaa' , 'bbb', null, null),
(3,'ccc' , 'ddd', 'abc@ddey.com', null),
(4,'ccc' , 'ddd', null, 34343322  )

select * from temptable

;with cte as
(
    select firstname, lastname
    ,(select top 1 id from temptable b where b.firstname = a.firstname and b.lastname = a.lastname and ( b.email is not null or b.homephone is not null)) tid
    ,(select top 1 email from temptable b where b.firstname = a.firstname and b.lastname = a.lastname and b.email is not null )  email
    ,(select top 1 homephone from temptable b where b.firstname = a.firstname and b.lastname = a.lastname and b.homephone is not null ) homephone
    from temptable a
    group by firstname , lastname
)

--select * from cte
merge temptable as a
using cte as b
on ( a.id = b.tid )
when matched 
    then 
    update set a.email = b.email , a.homephone = b.homephone    
when not matched by source then
    delete ;

select * from temptable

drop table temptable

刪除和合並SQL Server中的記錄

問題描述

2 個解決方案

解決方案1
0 已采納 2014-07-18 04:56:09

解決方案2
0 2014-07-18 05:32:15

刪除和合並SQL Server中的記錄

問題描述

2 個解決方案

解決方案1 0 已采納 2014-07-18 04:56:09

解決方案2 0 2014-07-18 05:32:15

解決方案1
0 已采納 2014-07-18 04:56:09

解決方案2
0 2014-07-18 05:32:15