删除除一个以外的所有重复项

Question

We have a table business_users with a user_id and business_id and we have duplicates. 我们有一个带有user_id和business_id的表business_users ，我们有重复项。 How can I write a query that will delete all duplicates except for one? 如何编写一个删除所有重复项的查询？

Answer 1

Completely identical rows 完全相同的行

If you want to avoid completely identical rows, as I understood your question at first, then you can select unique rows to a separate table and recreate the table data from that. 如果你想避免完全相同的行，我首先理解你的问题，那么你可以选择唯一的行到一个单独的表并从中重新创建表数据。

CREATE TEMPORARY TABLE tmp SELECT DISTINCT * FROM business_users;
DELETE FROM business_users;
INSERT INTO business_users SELECT * FROM tmp;
DROP TABLE tmp;

Be careful if there are any foreign key constraints referencing this table, though, as the temporary deletion of rows might lead to cascaded deletions elsewhere. 但是，如果存在引用此表的任何外键约束，请小心，因为临时删除行可能会导致其他位置的级联删除。

Introducing a unique constraint 引入一个独特的约束

If you only care about pairs of user_id and business_id , you probably want to avoid introducing duplicates in the future. 如果您只关心user_id和business_id对，则可能希望将来避免引入重复项。 You can move the existing data to a temporary table, add a constraint, and then move the table data back, ignoring duplicates. 您可以将现有数据移动到临时表，添加约束，然后将表数据移回，忽略重复项。

CREATE TEMPORARY TABLE tmp SELECT * FROM business_users;
DELETE FROM business_users;
ALTER TABLE business_users ADD UNIQUE (user_id, business_id);
INSERT IGNORE INTO business_users SELECT * FROM tmp;
DROP TABLE tmp;

The above answer is based on this answer . 以上答案基于这个答案。 The warning about foreign keys applies just as it did in the section above. 关于外键的警告就像在上面一节中所做的那样适用。

One-shot removal 一次性移除

If you only want to execute a single query, without modifying the table structure in any way, and you have a primary key id identifying each row, then you can try the following: 如果您只想执行单个查询，而不以任何方式修改表结构，并且您有一个标识每行的主键id ，那么您可以尝试以下操作：

DELETE FROM business_users WHERE id NOT IN
    (SELECT MIN(id) FROM business_users GROUP BY user_id, business_id);

A similar idea was previously suggested by this answer . 此答案之前提出了类似的想法。

If the above request fails, because you are not allowed to read and delete from a table in the same step, you can again use a temporary table: 如果上述请求失败，因为您不允许在同一步骤中从表中读取和删除，则可以再次使用临时表：

CREATE TEMPORARY TABLE tmp
SELECT MIN(id) id FROM business_users GROUP BY user_id, business_id;
DELETE FROM business_users WHERE id NOT IN (SELECT id FROM tmp);
DROP TABLE tmp;

If you want to, you can still introduce a uniqueness constraint after cleaning the data in this fashion. 如果您愿意，在以这种方式清理数据后仍然可以引入唯一性约束。 To do so, execute the ALTER TABLE line from the previous section. 为此，请执行上一节中的ALTER TABLE行。

Answer 2

Since you have a primary key, you can use that to pick which rows to keep: 由于您有一个主键，您可以使用它来选择要保留的行：

delete from business_users
where id not in (
    select id from (
        select min(id) as id -- Make a list of the primary keys to keep
        from business_users
        group by user_id, business_id -- Group by your duplicated row definition
    ) as a -- Derived table to force an implicit temp table
);

In this way, you won't need to create/drop temp tables and such (except the implicit one ). 这样，您就不需要创建/删除临时表等（隐式表除外）。

You might want to put a unique constraint on user_id, business_id so you don't have to worry about this again. 您可能希望对user_id, business_id设置唯一约束user_id, business_id因此您不必再担心这一点。

删除除一个以外的所有重复项

问题描述

2 个解决方案

解决方案1
9 2012-09-18 18:44:30

Completely identical rows 完全相同的行

Introducing a unique constraint 引入一个独特的约束

One-shot removal 一次性移除

解决方案2
3 2012-09-18 19:04:45

删除除一个以外的所有重复项

问题描述

2 个解决方案

解决方案1 9 2012-09-18 18:44:30

Completely identical rows 完全相同的行

Introducing a unique constraint 引入一个独特的约束

One-shot removal 一次性移除

解决方案2 3 2012-09-18 19:04:45

解决方案1
9 2012-09-18 18:44:30

解决方案2
3 2012-09-18 19:04:45