[英]Delete all the duplicates except one
We have a table business_users
with a user_id
and business_id
and we have duplicates. 我们有一个带有
user_id
和business_id
的表business_users
,我们有重复项。 How can I write a query that will delete all duplicates except for one? 如何编写一个删除所有重复项的查询?
If you want to avoid completely identical rows, as I understood your question at first, then you can select unique rows to a separate table and recreate the table data from that. 如果你想避免完全相同的行,我首先理解你的问题,那么你可以选择唯一的行到一个单独的表并从中重新创建表数据。
CREATE TEMPORARY TABLE tmp SELECT DISTINCT * FROM business_users;
DELETE FROM business_users;
INSERT INTO business_users SELECT * FROM tmp;
DROP TABLE tmp;
Be careful if there are any foreign key constraints referencing this table, though, as the temporary deletion of rows might lead to cascaded deletions elsewhere. 但是,如果存在引用此表的任何外键约束,请小心,因为临时删除行可能会导致其他位置的级联删除。
If you only care about pairs of user_id
and business_id
, you probably want to avoid introducing duplicates in the future. 如果您只关心
user_id
和business_id
对,则可能希望将来避免引入重复项。 You can move the existing data to a temporary table, add a constraint, and then move the table data back, ignoring duplicates. 您可以将现有数据移动到临时表,添加约束,然后将表数据移回,忽略重复项。
CREATE TEMPORARY TABLE tmp SELECT * FROM business_users;
DELETE FROM business_users;
ALTER TABLE business_users ADD UNIQUE (user_id, business_id);
INSERT IGNORE INTO business_users SELECT * FROM tmp;
DROP TABLE tmp;
The above answer is based on this answer . 以上答案基于这个答案 。 The warning about foreign keys applies just as it did in the section above.
关于外键的警告就像在上面一节中所做的那样适用。
If you only want to execute a single query, without modifying the table structure in any way, and you have a primary key id
identifying each row, then you can try the following: 如果您只想执行单个查询,而不以任何方式修改表结构,并且您有一个标识每行的主键
id
,那么您可以尝试以下操作:
DELETE FROM business_users WHERE id NOT IN
(SELECT MIN(id) FROM business_users GROUP BY user_id, business_id);
A similar idea was previously suggested by this answer . 此答案之前提出了类似的想法。
If the above request fails, because you are not allowed to read and delete from a table in the same step, you can again use a temporary table: 如果上述请求失败,因为您不允许在同一步骤中从表中读取和删除,则可以再次使用临时表:
CREATE TEMPORARY TABLE tmp
SELECT MIN(id) id FROM business_users GROUP BY user_id, business_id;
DELETE FROM business_users WHERE id NOT IN (SELECT id FROM tmp);
DROP TABLE tmp;
If you want to, you can still introduce a uniqueness constraint after cleaning the data in this fashion. 如果您愿意,在以这种方式清理数据后仍然可以引入唯一性约束。 To do so, execute the
ALTER TABLE
line from the previous section. 为此,请执行上一节中的
ALTER TABLE
行。
Since you have a primary key, you can use that to pick which rows to keep: 由于您有一个主键,您可以使用它来选择要保留的行:
delete from business_users
where id not in (
select id from (
select min(id) as id -- Make a list of the primary keys to keep
from business_users
group by user_id, business_id -- Group by your duplicated row definition
) as a -- Derived table to force an implicit temp table
);
In this way, you won't need to create/drop temp tables and such (except the implicit one ). 这样,您就不需要创建/删除临时表等( 隐式表除外)。
You might want to put a unique constraint on user_id, business_id
so you don't have to worry about this again. 您可能希望对
user_id, business_id
设置唯一约束user_id, business_id
因此您不必再担心这一点。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.