简体   繁体   中英

DELETE duplicates from table where there are two variables to consider

We have a table that is setup with an (auto Increment) ID, ClassID, StudentID. The class id is for the class that the student is taking. Sometimes our system creates duplicates of the same student in the same class. We are currently trying to fix that problem. It might have to do with hitting the back button.

Students will often take the next class so we don't want to delete duplicates of students. We want to delete duplicate students that are contained in the same classID.
For example:

ID | ClassID | StudentID
1  |   1     |     1
2  |   2     |     1
3  |   2     |     1
4  |   2     |     2
5  |   2     |     2

I want to delete IDs 3 and 5. I have searched the Internet for this answer and can't seem to find it. The best that I've found is grouping but how do I group each class id and find duplicates within each classID grouping?

I read an interesting article about something like this. As everybody knows, it is not a good thing to do a query like this one to remove duplicates :

SELECT ClassID, StudentID
FROM your_table
GROUP BY ClassID, StudentID;

In this case, DISTINCT would be the best solution. However, sometimes it is better to start with a bad syntax such as the one above to make a good query. First, let's select pairs that are a duplicate :

SELECT ClassID, StudentID
FROM your_table
GROUP BY ClassID, StudentID
HAVING COUNT(*) > 1;

As you may or may not know, you can't delete lines using a subquery in the DELETE query. You have to use a temporary table. The full code to do it is this one :

CREATE TEMPORARY TABLE keep_lines AS 
    SELECT MAX(id) AS id_to_keep -- you can use MIN if wanted
    FROM your_table
    GROUP BY ClassID, StudentID;

DELETE FROM your_table
WHERE id NOT IN (SELECT id_to_keep
                 FROM keep_lines);

DROP TABLE keep_lines;

Then, as many others stated, add a UNIQUE constraint to your table!

You can not DELETE or UPDATE records of the same table you are using to reference. As such, you will either need to create a temporary table to use as the reference. Or create a PHP script that will fire off a DELETE command of your matching IDs.

Here is an example SQL query though:

SELECT MIN(ID) AS minID, ClassID, StudentID
FROM the_table GROUP BY ClassID, StudentID HAVING COUNT(StudentID) > 1

You could run this multiple times and it would continue to remove duplicates.

You can use the following SQL statements to remove all but the earliest unique rows:

create temporary table unique_ids as
select min(id) as ID
  from some_table
 group by ClassID, StudentID;

delete some_table
  from some_table
       left join unique_ids using (id)
 where unique_ids.id is null;

If you're operating on a large table, consider adding an index after creating the temporary table.

You can find another approuch here . But the unique key constraint on ClassID and StudentID is something that you definitively need to do.

I'd strongly suggest a solution using a temporary table. Easy, fast, and no hazzle with complex queries. Just create a similar table (maybe type=MEMORY for speed), then insert all rows using a simple select distinct query, truncate the original table and replace the table data with the data from the temporary table.

Of course, this only works for databases which can be taken out of production for the duration.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM