[英]How to make delete duplicates faster?
On a table with about 1.7M rows, I tried to delete duplicates posts: 在一个大约1.7M行的表上,我试图删除重复的帖子:
delete a FROM comment a
INNER JOIN comment a2
WHERE a.id < a2.id
AND a.body = a2.body;
The result was: 结果是:
Query OK, 35071 rows affected (5 hours 36 min 48.79 sec)
This happened on my almost idle workstation with Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
. 这发生在我几乎空闲的工作站上,配备
Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
。 I'm wondering if there are some tricks to make this delete operation faster? 我想知道是否有一些技巧可以让这个删除操作更快?
The below query will be useful for you. 以下查询对您有用。
Delete YourTableName
From (
Select row_number() over(Partition by ColName1,ColName2,ColName3 order by ColName1,ColName2,ColName3 Asc)As RowNumber
)YourTableName
Where YourTableName.RowNumber>1
if it's working kindly mark as answer 如果它的工作友好地标记为答案
Your query is attempting a zillion deletes for the same row. 您的查询正在尝试对同一行进行大量删除。 For instance, if you have this data:
例如,如果您有这些数据:
body id
a 1
a 2
a 3
a 4
Then your query attempts the following deletions: 然后您的查询尝试以下删除:
c.body c.id c2.id
a 1 4
a 1 3
a 1 2
a 2 4
a 2 3
a 3 4
You can see how this would result in lots of work for the database, as the number of id
s on a given body
increase. 您可以看到这将如何导致数据库的大量工作,因为给定
body
上的id
数量增加。
You can fix this using group by
instead: 您可以使用
group by
来解决此问题:
delete c
from comment c join
(select c2.body, max(c2.id) as max_id
from comment c2
group by c2.body
) c2
on c2.body = c.body and c.id < c2.max_id;
In addition, you want an index on comment(body, id)
. 此外,您需要
comment(body, id)
的索引。
You might also find that an anti-join works better than the join you are attempting: 您可能还会发现反连接比您尝试的连接更有效:
delete c
from comment c left join
comment c2
on c2.body = c.body and c2.id > c.id
where c2.id is null;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.