简体   繁体   中英

SQL find row duplicates

I have a MySQL database similar to:

+----+---------+---------+------------------+....
| id | unique1 | unique2 |   genaric_data   |....
+----+---------+---------+------------------+....
| 0  |   100   |   1C7   | {data container} |....
+----+---------+---------+------------------+....
| 1  |   100   |   1C7   | {data container} |....
+----+---------+---------+------------------+....
| 2  |   100   |   1C8   | {data container} |....
+----+---------+---------+------------------+....
| 3  |   101   |   ---   | {data container} |....
+----+---------+---------+------------------+....
| 4  |   102   |   0     | {data container} |....
+----+---------+---------+------------------+....
| 5  |   103   |   1     | {data container} |....
.................................................

I need a way to add an extra column that gives the number of times all unique fields are used. I will then need to clean up the data manually.

I want a query to return:

+----+---------+---------+------+------------------+....
| id | unique1 | unique2 | dupe |   genaric_data   |....
+----+---------+---------+------+------------------+....
| 0  |   100   |   1C7   |   2  | {data container} |....
+----+---------+---------+------+------------------+....
| 1  |   100   |   1C7   |   2  | {data container} |....
+----+---------+---------+------+------------------+....
| 2  |   100   |   1C8   |   1  | {data container} |....
+----+---------+---------+------+------------------+....
| 3  |   101   |   ---   |   1  | {data container} |....
+----+---------+---------+------+------------------+....
| 4  |   102   |   0     |   1  | {data container} |....
+----+---------+---------+------+------------------+....
| 5  |   103   |   1     |   1  | {data container} |....
.......................................................

This has been a problem I have had for a while and currently my only solution is to export the data to excel and use it to find the duplicates.

Thanks.

Edit: The possible duplicate is not a solution to my problem since when I execute:

SELECT *,count(*) FROM `database`
GROUP BY  `unique1`
HAVING count(*) > 1

On PhpMyAdmin(All I'm allowed access to) it merges anything with the same unique1 into one line.

The solution to your problem is to use GROUP BY:

SELECT unique1, unique2, Count(*) As colCount FROM YourTable
GROUP BY unique1, unique2
HAVING Count(*) > 1

This will return all combinations of unique1 and unique2 that occur more than once.

In a second step, you can build a query that returns all affected rows.

SELECT YourTable.*, rstDuplicates.colCount 
FROM YourTable INNER JOIN (
  SELECT unique1, unique2, Count(*) As colCount FROM YourTable
  GROUP BY unique1, unique2
  HAVING Count(*) > 1
) As rstDuplicates ON YourTable.unique1 = rstDuplicates.unique1 And YourTable.unique2 = rstDuplicates.unique2

This will output all rows that have at least one duplicate. The colCount column shows the number of appearances.

If you want to add a field with the information, a correlated subquery is perhaps the easiest way:

select t.*,
       (select count(*)
        from table t2
        where t2.unique1 = t.unique1 and t2.unique2 = t.unique2
       ) as dupecnt
from table t;

Sometimes, this is efficient (with an index on unique1, unique2 . Sometimes, it is more efficient to do the aggregation in the from clause:

select t.*, t2.dupecnt
from table t join
     (select unique1, unique2, count(*) as dupecnt
      from table t2
      group by unique1, unique2
     ) t2
     on t2.unique1 = t.unique1 and t2.unique2 = t.unique1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM