简体   繁体   English

删除除一个以外的所有重复项

[英]Delete all the duplicates except one

We have a table business_users with a user_id and business_id and we have duplicates. 我们有一个带有user_idbusiness_id的表business_users ,我们有重复项。 How can I write a query that will delete all duplicates except for one? 如何编写一个删除所有重复项的查询?

Completely identical rows 完全相同的行

If you want to avoid completely identical rows, as I understood your question at first, then you can select unique rows to a separate table and recreate the table data from that. 如果你想避免完全相同的行,我首先理解你的问题,那么你可以选择唯一的行到一个单独的表并从中重新创建表数据。

CREATE TEMPORARY TABLE tmp SELECT DISTINCT * FROM business_users;
DELETE FROM business_users;
INSERT INTO business_users SELECT * FROM tmp;
DROP TABLE tmp;

Be careful if there are any foreign key constraints referencing this table, though, as the temporary deletion of rows might lead to cascaded deletions elsewhere. 但是,如果存在引用此表的任何外键约束,请小心,因为临时删除行可能会导致其他位置的级联删除。

Introducing a unique constraint 引入一个独特的约束

If you only care about pairs of user_id and business_id , you probably want to avoid introducing duplicates in the future. 如果您只关心user_idbusiness_id对,则可能希望将来避免引入重复项。 You can move the existing data to a temporary table, add a constraint, and then move the table data back, ignoring duplicates. 您可以将现有数据移动到临时表,添加约束,然后将表数据移回,忽略重复项。

CREATE TEMPORARY TABLE tmp SELECT * FROM business_users;
DELETE FROM business_users;
ALTER TABLE business_users ADD UNIQUE (user_id, business_id);
INSERT IGNORE INTO business_users SELECT * FROM tmp;
DROP TABLE tmp;

The above answer is based on this answer . 以上答案基于这个答案 The warning about foreign keys applies just as it did in the section above. 关于外键的警告就像在上面一节中所做的那样适用。

One-shot removal 一次性移除

If you only want to execute a single query, without modifying the table structure in any way, and you have a primary key id identifying each row, then you can try the following: 如果您只想执行单个查询,而不以任何方式修改表结构,并且您有一个标识每行的主键id ,那么您可以尝试以下操作:

DELETE FROM business_users WHERE id NOT IN
    (SELECT MIN(id) FROM business_users GROUP BY user_id, business_id);

A similar idea was previously suggested by this answer . 此答案之前提出了类似的想法。

If the above request fails, because you are not allowed to read and delete from a table in the same step, you can again use a temporary table: 如果上述请求失败,因为您不允许在同一步骤中从表中读取和删除,则可以再次使用临时表:

CREATE TEMPORARY TABLE tmp
SELECT MIN(id) id FROM business_users GROUP BY user_id, business_id;
DELETE FROM business_users WHERE id NOT IN (SELECT id FROM tmp);
DROP TABLE tmp;

If you want to, you can still introduce a uniqueness constraint after cleaning the data in this fashion. 如果您愿意,在以这种方式清理数据后仍然可以引入唯一性约束。 To do so, execute the ALTER TABLE line from the previous section. 为此,请执行上一节中的ALTER TABLE行。

Since you have a primary key, you can use that to pick which rows to keep: 由于您有一个主键,您可以使用它来选择要保留的行:

delete from business_users
where id not in (
    select id from (
        select min(id) as id -- Make a list of the primary keys to keep
        from business_users
        group by user_id, business_id -- Group by your duplicated row definition
    ) as a -- Derived table to force an implicit temp table
);

In this way, you won't need to create/drop temp tables and such (except the implicit one ). 这样,您就不需要创建/删除临时表等( 隐式表除外)。

You might want to put a unique constraint on user_id, business_id so you don't have to worry about this again. 您可能希望对user_id, business_id设置唯一约束user_id, business_id因此您不必再担心这一点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM