简体   繁体   English

如何优化此DB操作?

[英]How to optimize this DB operation?

I'm quite sloppy with databases, can't get this working with joins, and I'm not even sure that would be faster... 我对数据库很草率,无法使用连接,我甚至不确定会更快...

DELETE FROM atable 
WHERE  btable_id IN (SELECT id 
                     FROM   btable 
                     WHERE  param > 2) 
       AND ctable_id IN (SELECT id 
                         FROM   ctable 
                         WHERE  ( someblob LIKE '%_ID1_%' 
                                  OR someblob LIKE '%_ID2_%' )) 

Atable contains ~19M rows, this would delete ~3M of that. Atable包含~19M行,这将删除~3M。 At the moment, I can only run the query with LIMIT 100000 , and I don't want to sit here with phpmyadmin all day, because each deletion (of 100.000 rows) runs for about 1.5 mins. 目前,我只能使用LIMIT 100000运行查询,我不想整天坐在这里与phpmyadmin,因为每次删除(100.000行)运行约1.5分钟。

Any ways to speed this up / automate it? 有什么方法可以加快/自动化它?

MySQL 5.5 MySQL 5.5

(do you think it's already bad DB design if any table contains 20M rows?) (如果任何表包含20M行,你认为它已经是糟糕的数据库设计吗?)

Use EXISTS or JOIN instead of IN to improve perfromance 使用EXISTSJOIN代替IN来改善性能

Using EXISTS: 使用EXISTS:

DELETE FROM Atable A 
WHERE EXISTS (SELECT 1 FROM Btable B WHERE A.Btable_id = B.id AND B.param > 2) AND 
      EXISTS (SELECT 1 FROM Ctable C WHERE A.Ctable_id = C.id AND (C.someblob LIKE '%_ID1_%' OR C.someblob LIKE '%_ID2_%'))

Using JOIN: 使用JOIN:

DELETE A 
FROM Atable A 
INNER JOIN Btable B ON A.Btable_id = B.id AND B.param > 2
INNER JOIN Ctable C WHERE A.Ctable_id = C.id AND (C.someblob LIKE '%_ID1_%' OR C.someblob LIKE '%_ID2_%')

Beside optimizing the query you could also take a look at a good use of indexes, since they might prevent a full table scan. 除了优化查询之外,您还可以查看索引的良好使用,因为它们可能会阻止全表扫描。

For BTable for example create an index on id and param. 例如,对于BTable,在id和param上创建一个索引。

To explain why this helps: If the database has to look up the id and param values in the table in a unsorted manner, the database has to read ALL rows. 解释为什么这会有所帮助:如果数据库必须以未排序的方式查找表中的id和param值,则数据库必须读取所有行。 If the database reads the index, SORTED, it can look up the id and param with reduced costs. 如果数据库读取索引SORTED,它可以以降低的成本查找id和param。

first you should try with exist instead of in. it's faster in many many case. 首先你应该尝试使用exists而不是in。在很多情况下它会更快。

Then you could try to do inner join instead of in and exists. 然后你可以尝试做内连接而不是in和exists。

Example : 示例:

delete a 
from a 
inner join b on b.id = a.tablebid

And finally if it could be possible (i don't know if you have id3, ids) to change the or by something else. 最后,如果有可能(我不知道你是否有id3,ids)改变或通过别的东西。 Sometimes strange and complicated change helps the optimizer. 有时奇怪而复杂的变化有助于优化器。 case when, subquery... 情况何时,子查询......

I don't see where a simple index would help much. 我没有看到一个简单的索引会有多大帮助。 I'd do: 我会做:

delete from atable where id in (
    select
        id
    from
        atable a
        join btable b on a.btable_id = b.id
        join ctable c on a.ctable_id = c.id
    where
        b.param > 2
        and (
            c.someblob LIKE '%_ID1_%' 
            OR c.someblob LIKE '%_ID2_%'
        )
)

Correction: I'm assuming you've got indexes on btable and ctable's id's (probably, if they're primary keys...) and on b.param (if it's numeric). 更正:我假设你有关于btable和ctable的id的索引(可能,如果它们是主键......)和b.param(如果它是数字)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM