简体   繁体   中英

Delete duplicates based on Group By - SQL

EDIT: I think I now have the solution but need to do some more sense checking...

DELETE TBLFIRE_TEMP3 FROM TBLFIRE_TEMP3
LEFT OUTER JOIN (
   SELECT MIN(FireNo) as FireNo, ActionRef, FRADate, FIREUPRN
   FROM TBLFIRE_TEMP3 
   GROUP BY ActionRef, FRADate, FIREUPRN
) as KeepRows ON
   TBLFIRE_TEMP3.FireNo = KeepRows.FireNo
WHERE
   KeepRows.FireNo IS NULL

-############### Previous Comments ###############

I have a table which has duplicates in (based on three columns). I can find them and see them by doing the following and would then simply want to delete the duplicates (ie so all count(*) results are '1')

SELECT COUNT(*),ActionRef, FRADate, FIREUPRN
FROM TBLTempTable
GROUP BY ActionRef, FRADate, FIREUPRN

So I can see the count of how many times these groups occur. What I want to do is Delete the duplicates. I've tried the below but it deletes every row, even singular:

DELETE a FROM TblTempTable a JOIN
(
  SELECT ActionRef, FRADate, FIREUPRN
    FROM TblTempTable 
   GROUP BY ActionRef, FRADate, FIREUPRN
) d 
   ON (a.ActionRef = b.ActionRef
  AND a.FRADate = b.FRADate
AND a.FIREUPRN = b.FIREUPRN)

Based on the codes I've looked at the guide me I believe I am close but currently it deletes everything.

References: SQL- How can I remove duplicate rows? GROUP BY does not remove duplicates

-These are MySQL so not to relevant in the end:

select and delete rows within groups using mysql Find duplicate records in MySQL

A simple solution is to use a CTE with ROW_NUMBER :

WITH Data AS
(
    SELECT RN  = ROW_NUMBER() OVER (PARTITION BY ActionRef, FRADate, FIREUPRN
                                    ORDER BY FRADate ASC),
           Cnt = COUNT(*) OVER (PARTITION BY ActionRef, FRADate, FIREUPRN),
           ActionRef, FRADate, FIREUPRN
    FROM TBLTempTable
)
DELETE FROM Data
WHERE RN > 1

This deletes all but one, it keeps the oldest FRADate . You need to change the ORDER BY in ROW_NUMBER to change this logic.

One advantage of a CTE is that you can change it easily to see what you're going to delete (or update). Therefore you just have to replace DELETE FROM Data with SELECT * FROM Data .

There's a simpler method for readability too:

;WITH DEDUPE AS (
SELECT ROW_NUMBER() OVER(
    PARTITION BY ActionRef, FRADate, FIREUPRN
        ORDER BY (SELECT 1)) AS RN
FROM TBLTempTable)
DELETE FROM DEDUPE
WHERE RN != 1

We use this exact script at work on a daily basis. You can change the ORDER BY clause to any column, if you want to keep newer rows based on a date column etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM