简体   繁体   English

为什么使用子查询进行 DELETE 比使用简单的 ID 列表慢得多?

[英]Why DELETE with subquery is much slower than with simple list of IDs?

I wanted to delete lots of rows from medium size (700K) table, based on primary key.我想根据主键从中等大小 (700K) 表中删除大量行。 Thought, the best way should use SELECT -subquery for DELETE source list.认为,最好的方法应该使用SELECT -subquery 来DELETE源列表。 And found specific answer here too .并在这里找到了具体的答案 Problem is: it is so much slower than using two separate queries (first select IDs and then delete those IDs from table).问题是:它比使用两个单独的查询慢得多(首先选择 ID,然后从表中删除这些 ID)。 Why is that so?为什么呢?

I made simple test case too:我也做了简单的测试用例:

CREATE TABLE `xyz` (
  `xyzID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `col1` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`xyzID`)
) ENGINE=InnoDB;

Populated it with million records, and then:用百万条记录填充它,然后:

DELETE FROM xyz
WHERE xyzID IN
        (
        SELECT xyzID
        FROM
            (
                SELECT xyzID
                FROM xyz
                LIMIT 3000,1000
            ) a
        );
Query OK, 1000 rows affected (53.52 sec)

Deleting 2000 rows doubles time:删除 2000 行会使时间加倍:

Query OK, 2000 rows affected (1 min 48.25 sec)

But deleting without subquery (made select first) took almost no time (id-list generated by random, here):但是删除没有子查询(首先选择)几乎没有时间(随机生成的id列表,在这里):

DELETE FROM test.xyz WHERE xyzID IN ( 660422,232794,573802,....
Query OK, 996 rows affected (0.04 sec)

Why is deleting with subquery so slow?为什么用子查询删除这么慢?

If you read the documentation on subqueries, you will find some things that might be the cause for this: https://dev.mysql.com/doc/refman/5.7/en/subquery-restrictions.html如果您阅读有关子查询的文档,您会发现一些可能导致此问题的原因: https : //dev.mysql.com/doc/refman/5.7/en/subquery-restrictions.html

The optimizer will rewrite your uncorrelated WHERE IN (Subquery) statements to correlated statements using exists .优化器将使用exists将不相关的WHERE IN (Subquery)语句重写为相关语句。

So, your query might actually be executed like this:因此,您的查询实际上可能是这样执行的:

DELETE FROM xyz t1
WHERE EXISTS (
    (
    SELECT 1
    FROM
        (
            SELECT xyzID t3
            FROM xyz
            LIMIT 3000,1000
        ) a
    where t1.xyzID = a.xyzID
    );

The correlated subquery now needs to be executed Everytime a single row is deleted.现在每次删除一行时都需要执行相关子查询。

So: For 1000 deletions, you will run 1000 subqueries on the temporary table a .所以:对于 1000 次删除,您将在临时表a上运行 1000 个子查询。 Only the inner query will remain uncorrelated.只有内部查询将保持不相关。

Compared to in(valuelist) you are running 1001 queries rather than 1 .in(valuelist)相比,您正在运行1001查询而不是1

docu:文档:

An implication is that an IN subquery can be much slower than a query written using an IN(value_list) operator that lists the same values that the subquery would return.这意味着 IN 子查询可能比使用 IN(value_list) 运算符编写的查询慢得多,该运算符列出子查询将返回的相同值。

The first step to solving this problem is to select the ids you want to delete into a temporary table.解决这个问题的第一步是选择要删除的id到一个临时表中。 However, you might still run into the slow subquery problem when you try to actually do the delete.但是,当您尝试实际执行删除操作时,您可能仍会遇到慢子查询问题。

The solution to that is to use DELETE xyz FROM xyz INNER JOIN xyz_temp WHERE xyz.id = xyz_temp.id syntax, which achieves the same thing and runs as fast as a simple join.解决方案是使用DELETE xyz FROM xyz INNER JOIN xyz_temp WHERE xyz.id = xyz_temp.id语法,它实现了相同的事情并且运行速度与简单连接一样快。

Subqueries means you are asking your db engine to compare all the "N" rows in the first table with all the "M" rows in another table you are creating in that moment.子查询意味着您要求数据库引擎将第一个表中的所有“N”行与您当时正在创建的另一个表中的所有“M”行进行比较。 That's means you have N*M compare operation and to do it, you need to join the tables.这意味着你有 N*M 比较操作,要做到这一点,你需要加入表。 The table you are building have N * M rows.您正在构建的表有 N * M 行。

Without subquery you are just comparing all the "N" row in your table with "X" keywords where "X" << "M".如果没有子查询,您只是将表中的所有“N”行与“X”关键字进行比较,其中“X”<<“M”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM