简体   繁体   English

如何消除SQL Server 2005数据库表中的重复行?

[英]How to eliminiate duplicates rows in table of sql server 2005 database?

i am using sql server 2005. i have a table contains duplicated rows. 我正在使用sql server2005。我有一个包含重复行的表。 how can i eliminated those duplicate rows in that table? 如何消除该表中的重复行? for eg, the table may contain 3 similar rows in which i want to delete 2 rows and keep the original 例如,该表可能包含3个相似的行,我想在其中删除2行并保留原始行

If you just want to eliminate duplicate records from your result set, you can use the DISTINCT command: 如果只想从结果集中消除重复的记录,则可以使用DISTINCT命令:

SELECT DISTINCTI field1, field2 FROM...

If you want to delete those duplicate records, you can use COUNT to detect which records have more than one instance, and then deleting them with a subquery 如果要删除这些重复的记录,则可以使用COUNT检测哪些记录具有多个实例,然后使用子查询将其删除。

First you can copy duplicate record into another table like as following way 首先,您可以将重复记录复制到另一个表中,如下所示

Select fieldnames into #temp from table1 group by fieldnames having count (*) > 1

then remove that record from original table 然后从原始表中删除该记录

delete from table1 where fieldname in (select fieldnames from #temp)

and finally copy record from temporary table to original source table. 最后将记录从临时表复制到原始源表。

insert table1 select * from #temp

through above mentioned steps you can eliminate duplicate record from table. 通过上述步骤,您可以消除表中的重复记录。

For understanding purpose, lets take a simple table Employee with below schema 为了理解目的,让我们使用具有以下架构的简单表Employee

EmployeeId - int
EmployeeName    varchar(50)
Age int

Lets populate with duplicate values. 让我们填充重复的值。 Please note primary key is not duplicated in this case 请注意在这种情况下主键不重复

INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (1,'Mark',20)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (2,'Tom',22)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (3,'Sam',24)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (4,'Mark',20)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (5,'Tom',22)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (6,'Tom',22)
GO 

we can make use of CTE in finding the duplicate rows. 我们可以利用CTE查找重复的行。 Gather duplicate rows by using Group by/Count statement. 通过使用Group by / Count语句收集重复的行。 Once Duplicate rows are Identified, we select those rows from the main table using join condition. 一旦确定了重复的行,我们就使用联接条件从主表中选择那些行。 Now rank those Rows and delete all the rows apart from the rows with rank 1. I find this a lot more elegant. 现在对那些行进行排名,并删除排名1的行以外的所有行。我发现这更加优雅。

WITH TotalDuplicates(EmployeeName,Age,Total) AS 
(
    SELECT EmployeeName,Age,COUNT(employeeId)  AS Total  FROM Employee 
    GROUP BY EmployeeName,Age
    HAVING COUNT(employeeId) > 1
)
,DistinctRows(EmployeeId,EmployeeName,Age) AS 
(
    SELECT E.EmployeeId,E.EmployeeName,E.Age FROM Employee AS E
    INNER JOIN TotalDuplicates AS D 
    ON (E.EmployeeName = D.EmployeeName AND E.Age = D.Age)
)
,OrderedDuplicateTables(EmployeeId,EmployeeName,Age,Ranking) AS 
(
    SELECT 
        EmployeeId,
        EmployeeName,
        Age,
        RANK() OVER (PARTITION BY EmployeeName, Age ORDER BY EmployeeId DESC) 
    FROM DistinctRows 
)

DELETE FROM Employee
WHERE EmployeeId IN (SELECT EmployeeId FROM OrderedDuplicateTables WHERE Ranking > 1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM