简体   繁体   English

如何更快地删除重复项?

[英]How to make delete duplicates faster?

On a table with about 1.7M rows, I tried to delete duplicates posts: 在一个大约1.7M行的表上,我试图删除重复的帖子:

delete a FROM comment a
  INNER JOIN comment a2
     WHERE a.id < a2.id
     AND   a.body = a2.body;

The result was: 结果是:

  Query OK, 35071 rows affected (5 hours 36 min 48.79 sec)

This happened on my almost idle workstation with Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz . 这发生在我几乎空闲的工作站上,配备Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz I'm wondering if there are some tricks to make this delete operation faster? 我想知道是否有一些技巧可以让这个删除操作更快?

For MySQL specifically you can try (assuming rows have the exact same columns information): 对于MySQL,你可以尝试(假设行具有完全相同的列信息):

ALTER IGNORE TABLE comment ADD UNIQUE INDEX idx_name (id, body);

Source 资源

The below query will be useful for you. 以下查询对您有用。

Delete  YourTableName 
From    (
Select  row_number() over(Partition by ColName1,ColName2,ColName3 order by ColName1,ColName2,ColName3 Asc)As RowNumber
        )YourTableName
Where   YourTableName.RowNumber>1

if it's working kindly mark as answer 如果它的工作友好地标记为答案

Your query is attempting a zillion deletes for the same row. 您的查询正在尝试对同一行进行大量删除。 For instance, if you have this data: 例如,如果您有这些数据:

body   id
  a     1
  a     2
  a     3
  a     4

Then your query attempts the following deletions: 然后您的查询尝试以下删除:

 c.body   c.id  c2.id
  a         1      4
  a         1      3
  a         1      2
  a         2      4
  a         2      3
  a         3      4

You can see how this would result in lots of work for the database, as the number of id s on a given body increase. 您可以看到这将如何导致数据库的大量工作,因为给定body上的id数量增加。

You can fix this using group by instead: 您可以使用group by来解决此问题:

delete c 
    from comment c join
         (select c2.body, max(c2.id) as max_id
          from comment c2
          group by c2.body
         ) c2
         on c2.body = c.body and c.id < c2.max_id;

In addition, you want an index on comment(body, id) . 此外,您需要comment(body, id)的索引。

You might also find that an anti-join works better than the join you are attempting: 您可能还会发现反连接比您尝试的连接更有效:

delete c 
    from comment c left join
         comment c2
         on c2.body = c.body and c2.id > c.id
    where c2.id is null;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM