简体   繁体   English

删除基于Group By - SQL的重复项

[英]Delete duplicates based on Group By - SQL

EDIT: I think I now have the solution but need to do some more sense checking... 编辑:我想我现在有解决方案,但需要做一些更有意义的检查......

DELETE TBLFIRE_TEMP3 FROM TBLFIRE_TEMP3
LEFT OUTER JOIN (
   SELECT MIN(FireNo) as FireNo, ActionRef, FRADate, FIREUPRN
   FROM TBLFIRE_TEMP3 
   GROUP BY ActionRef, FRADate, FIREUPRN
) as KeepRows ON
   TBLFIRE_TEMP3.FireNo = KeepRows.FireNo
WHERE
   KeepRows.FireNo IS NULL

-############### Previous Comments ############### - ###############上一条评论###############

I have a table which has duplicates in (based on three columns). 我有一个重复的表(基于三列)。 I can find them and see them by doing the following and would then simply want to delete the duplicates (ie so all count(*) results are '1') 我可以通过执行以下操作找到它们并查看它们然后只是想删除重复项(即所有计数(*)结果都是'1')

SELECT COUNT(*),ActionRef, FRADate, FIREUPRN
FROM TBLTempTable
GROUP BY ActionRef, FRADate, FIREUPRN

So I can see the count of how many times these groups occur. 所以我可以看到这些群体发生的次数。 What I want to do is Delete the duplicates. 我想要做的是删除重复项。 I've tried the below but it deletes every row, even singular: 我已经尝试了以下但它删除了每一行,甚至是单数:

DELETE a FROM TblTempTable a JOIN
(
  SELECT ActionRef, FRADate, FIREUPRN
    FROM TblTempTable 
   GROUP BY ActionRef, FRADate, FIREUPRN
) d 
   ON (a.ActionRef = b.ActionRef
  AND a.FRADate = b.FRADate
AND a.FIREUPRN = b.FIREUPRN)

Based on the codes I've looked at the guide me I believe I am close but currently it deletes everything. 基于我看过指南的代码,我相信我很接近,但目前它删除了一切。

References: SQL- How can I remove duplicate rows? 参考:SQL- 如何删除重复的行? GROUP BY does not remove duplicates GROUP BY不会删除重复项

-These are MySQL so not to relevant in the end: - 这些是MySQL所以最终不相关:

select and delete rows within groups using mysql Find duplicate records in MySQL 使用mysql选择和删除组内的行在MySQL中 查找重复的记录

A simple solution is to use a CTE with ROW_NUMBER : 一个简单的解决方案是使用带有ROW_NUMBER的CTE:

WITH Data AS
(
    SELECT RN  = ROW_NUMBER() OVER (PARTITION BY ActionRef, FRADate, FIREUPRN
                                    ORDER BY FRADate ASC),
           Cnt = COUNT(*) OVER (PARTITION BY ActionRef, FRADate, FIREUPRN),
           ActionRef, FRADate, FIREUPRN
    FROM TBLTempTable
)
DELETE FROM Data
WHERE RN > 1

This deletes all but one, it keeps the oldest FRADate . 这将删除除一个之外的所有内容,它保留最旧的FRADate You need to change the ORDER BY in ROW_NUMBER to change this logic. 您需要更改ROW_NUMBERORDER BY以更改此逻辑。

One advantage of a CTE is that you can change it easily to see what you're going to delete (or update). CTE的一个优点是您可以轻松地更改它以查看您要删除(或更新)的内容。 Therefore you just have to replace DELETE FROM Data with SELECT * FROM Data . 因此,您只需使用SELECT * FROM Data替换DELETE FROM Data SELECT * FROM Data

There's a simpler method for readability too: 还有一种更简单的可读性方法:

;WITH DEDUPE AS (
SELECT ROW_NUMBER() OVER(
    PARTITION BY ActionRef, FRADate, FIREUPRN
        ORDER BY (SELECT 1)) AS RN
FROM TBLTempTable)
DELETE FROM DEDUPE
WHERE RN != 1

We use this exact script at work on a daily basis. 我们每天都在使用这个精确的脚本。 You can change the ORDER BY clause to any column, if you want to keep newer rows based on a date column etc. 如果要根据日期列等保留较新的行,可以将ORDER BY子句更改为任何列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM