简体   繁体   English

利用IN性能改善删除

[英]Improve delete with IN performance

I struggle to write a DELETE query in MariaDB 5.5.44 database. 我很难在MariaDB 5.5.44数据库中编写DELETE查询。

The first of the two following code samples works great, but I need to add a WHERE statement there. 以下两个代码示例中的第一个效果很好,但是我需要在其中添加WHERE语句。 That is displayed in the second code sample. 这将显示在第二个代码示例中。

I need to delete only rows from polozkyTransakci where puvodFaktury <> FAKTURA VO CZ in transakce_tmp table. 我只需要从polozkyTransakci中删除transakce_tmp表中的puvodFaktury <> FAKTURA VO CZ I thought that my WHERE statement in the second sample could have worked ok with the inner SELECT, but it takes forever to process (it takes about 40 minutes in my cloud based ETL tool) and even then it does not leave the rows I want untouched. 我以为第二个示例中的WHERE语句可以与内部SELECT一起使用,但是它需要花很长时间才能处理(在基于云的ETL工具中大约需要40分钟),即使这样,它也不会留下我想要的行。

1. 1。

DELETE FROM polozkyTransakci
WHERE typPolozky = 'odpocetZalohy';

2. 2。

DELETE FROM polozkyTransakci
WHERE typPolozky = 'odpocetZalohy'
     AND idTransakce NOT IN (
 SELECT idTransakce
 FROM transakce_tmp
 WHERE puvodFaktury = 'FAKTURA VO CZ');

Thaks a million for any help 向任何人致谢一百万

David 大卫

IN is very bad on performance .. Try using NOT EXISTS() IN对性能非常不利。尝试使用NOT EXISTS()

DELETE FROM polozkyTransakci 
WHERE typPolozky = 'odpocetZalohy'
     AND NOT EXISTS (SELECT 1
                     FROM transakce_tmp r
                     WHERE r.puvodFaktury = 'FAKTURA VO CZ'
                          AND r.idTransakce = polozkyTransakci.idTransakce );

Before you can performance tune, you need to figure out why it is not deleting the correct rows. 在进行性能调整之前,您需要弄清楚为什么它没有删除正确的行。

So first start with doing selects until you get the right rows identified. 因此,首先要进行选择,直到找到正确的行。 Build your select a bit at time checking the results at each stage to see if you are getting the results you want. 请在每次检查结果时建立一些选择,以查看是否获得所需的结果。

Once you have the select then you can convert to a delete. 选择之后,即可转换为删除。 When testing the delete do it is a transaction and run some test of the data that is left behind to ensure it deleted properly before rolling back or committing. 测试删除时,它是一个事务,并对遗留的数据进行一些测试,以确保在回滚或提交之前正确删除了该数据。 Since you likely want to performance tune, I would suggest rolling back, so that you can then try again on the performance tuned version to ensure you got the same results. 由于您可能想对性能进行调整,因此建议您回退,以便随后可以再次尝试对性能进行调整的版本,以确保获得相同的结果。 Of course, you only want to do this on a dev server! 当然,您只想在开发服务器上执行此操作!

Now while I agree that not exists may be faster, some of the other things you want to look at are: 现在,尽管我同意不存在可能会更快,但是您要查看的其他一些内容是:

  • do you have cascade deletes happening? 您是否正在发生级联删除? If you end up deleting many child records, that could be part of the problem. 如果最终删除许多子记录,则可能是问题的一部分。
  • Are there triggers affecting the delete? 是否有影响删除的触发器? especially look to see if someone set one up to run through things row by row instead of as a set. 尤其要看是否有人设置了一行一行地而不是一组地运行。 Row by row triggers are a very bad thing when you delete many records. 当您删除许多记录时,逐行触发器是一件非常糟糕的事情。 For instance suppose you are deleting 50K records and you have a delete trigger to an audit table. 例如,假设您要删除5万条记录,并且对审计表有删除触发器。 If it inserts to that table one record at a time, it is being executed 50K times. 如果一次向该表插入一条记录,则该记录将被执行50K次。 If it inserts all the deleted records in one step, that insert individually might take a bit longer but the total execution is much shorter. 如果它一步插入所有已删除的记录,则单独插入可能会花费更长的时间,但总执行时间要短得多。
  • What indexing do you have and is it helping the delete out? 您有什么索引,它对删除有帮助吗?
  • You will want to examine the explain plan for each of your queries to see if they are improving the details of how the query will be performed. 您将要检查每个查询的解释计划,以查看它们是否在改进查询执行方式的详细信息。

Performance tuning is a complex thing and it is best to get read up on it in detail by reading some of the performance tuning books available for your specific database. 性能调优是一件复杂的事情,最好阅读特定数据库可用的一些性能调优书来详细阅读它。

I might be inclined to write the query as a LEFT JOIN , although I'm guessing this would have the same performance plan as NOT EXISTS : 我可能倾向于将查询写为LEFT JOIN ,尽管我猜测这将具有与NOT EXISTS相同的性能计划:

DELETE pt
    FROM polozkyTransakci pt LEFT JOIN
         transakce_tmp tt
         ON pt.idTransakce = tt.idTransakce AND
            tt.puvodFaktury = 'FAKTURA VO CZ'
    WHERE pt.typPolozky = 'odpocetZalohy' AND tt.idTransakce IS NULL;

I would recommend indexes, if you don't have them: polozkyTransakci(typPolozky, idTransakce) and transakce_tmp(idTransakce, puvodFaktury) . 如果没有索引,我会推荐索引: polozkyTransakci(typPolozky, idTransakce)transakce_tmp(idTransakce, puvodFaktury) These would work on the NOT EXISTS version as well. 这些也可以在NOT EXISTS版本上使用。

You can test the performance of these queries using SELECT : 您可以使用SELECT测试这些查询的性能:

    SELECT pt.*
    FROM polozkyTransakci pt LEFT JOIN
         transakce_tmp tt
         ON pt.idTransakce = tt.idTransakce AND
            tt.puvodFaktury = 'FAKTURA VO CZ'
    WHERE pt.typPolozky = 'odpocetZalohy' AND tt.idTransakce IS NULL;

The DELETE should be slower (due to the cost of logging transactions), but the performance should be comparable. DELETE应该更慢(由于记录事务的成本),但是性能应该是可比的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM