简体   繁体   English

提高删除查询的性能

[英]Improve performance of delete query

I want to delete entries from table B so there is only a single entry per A_id (the one with highest ID) 我想从表B中删除条目,所以每个A_id (ID最高的条目)只有一个条目

Table A : 表A

+----+------------+
| id |    name    |
+----+------------+
|  1 | Some name  |
|  2 | Other name |
+----+------------+

Table B : 表B

+----+-------+------+
| id | stuff | A_id |
+----+-------+------+
|  1 | aab   |    1 |
|  2 | aac   |    1 |
|  3 | aad   |    2 |
|  4 | aae   |    1 |
|  5 | aak   |    1 |
|  6 | aal   |    2 |
+----+-------+------+

My current query (that works fine): 我当前的查询(工作正常):

DELETE FROM B 
WHERE id NOT IN (SELECT MAX(id)
                 FROM B
                 GROUP BY A_id)

Which results in the correct result: 得出正确的结果:

+----+-------+------+
| id | stuff | A_id |
+----+-------+------+
|  5 | aak   |    1 |
|  6 | aal   |    2 |
+----+-------+------+

But it is very very slow when there are many rows in table B. Is there any way to improve the performance of the query (or perhaps do it in an entirely different way?) 但是当表B中有很多行时,它非常慢。是否有任何方法可以提高查询的性能(或者可能以完全不同的方式来执行此操作?)

You are deleting a large number of rows. 您正在删除大量的行。 That is the problem. 那就是问题所在。 There is lots of overhead in deletions. 删除有很多开销。

If you are deleting a significant number of rows in a table -- and significant might only be a few percent -- then it is often faster to recreate the table: 如果要删除表中的大量行(并且可能只占百分之几),则重新创建表通常会更快:

select b.*
into temp_b  -- actually, I wouldn't use a temporary table in case the server goes down
from b
where b.id = (select max(a.id) from b b2 where b2.id = b.a_id);

truncate table b;

insert into b
    select *
    from temp_b;

Before attempting this, be sure that you have backed up b or at least stashed a copy of it somewhere. 尝试执行此操作之前,请确保已备份b或至少将其副本保存在某处。

Note that I changed the structure of the NOT IN . 请注意,我更改了NOT IN的结构。 I strongly discourage the use of NOT IN , because the semantics are not intuitive when the subquery returns NULL values. 我强烈不鼓励使用NOT IN ,因为当子查询返回NULL值时语义不直观。 If there were a single NULL value, then the WHERE would never evaluate to TRUE. 如果只有一个NULL值,则WHERE永远不会求值为TRUE。 Even if NULL values are not a problem in this case, I strongly recommend using other alternatives so you won't have a problem when NULL s are a possibility. 即使在这种情况下NULL值不是问题,我也强烈建议您使用其他替代方法,这样当NULL成为可能时您就不会有问题。

For performance on the SELECT , you want an index on b(a_id, id) . 为了提高SELECT性能,您需要在b(a_id, id)上建立索引。 You might find that such an index helps on your original query. 您可能会发现这样的索引有助于您的原始查询。

Your query looks fine to me. 您的查询对我来说很好。

Your problem seems to be that you have a very large amount of data and need ways to optimize performance. 您的问题似乎是您拥有大量数据,并且需要优化性能的方法。

What you can do is materialize your subquery, and make sure max_id is indexed, for example by making it a primary key. 您可以做的是具体化您的子查询,并确保将max_id编入索引,例如通过使其成为主键。

So create a temporary table Max_B , and store the results of your sub query in this table. 因此,创建一个临时表Max_B ,并将子查询的结果存储在此表中。 Then perform the delete and drop the temp table afterwards. 然后执行删除操作,然后删除临时表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM