简体   繁体   中英

Improve delete with IN performance

I struggle to write a DELETE query in MariaDB 5.5.44 database.

The first of the two following code samples works great, but I need to add a WHERE statement there. That is displayed in the second code sample.

I need to delete only rows from polozkyTransakci where puvodFaktury <> FAKTURA VO CZ in transakce_tmp table. I thought that my WHERE statement in the second sample could have worked ok with the inner SELECT, but it takes forever to process (it takes about 40 minutes in my cloud based ETL tool) and even then it does not leave the rows I want untouched.

1.

DELETE FROM polozkyTransakci
WHERE typPolozky = 'odpocetZalohy';

2.

DELETE FROM polozkyTransakci
WHERE typPolozky = 'odpocetZalohy'
     AND idTransakce NOT IN (
 SELECT idTransakce
 FROM transakce_tmp
 WHERE puvodFaktury = 'FAKTURA VO CZ');

Thaks a million for any help

David

IN is very bad on performance .. Try using NOT EXISTS()

DELETE FROM polozkyTransakci 
WHERE typPolozky = 'odpocetZalohy'
     AND NOT EXISTS (SELECT 1
                     FROM transakce_tmp r
                     WHERE r.puvodFaktury = 'FAKTURA VO CZ'
                          AND r.idTransakce = polozkyTransakci.idTransakce );

Before you can performance tune, you need to figure out why it is not deleting the correct rows.

So first start with doing selects until you get the right rows identified. Build your select a bit at time checking the results at each stage to see if you are getting the results you want.

Once you have the select then you can convert to a delete. When testing the delete do it is a transaction and run some test of the data that is left behind to ensure it deleted properly before rolling back or committing. Since you likely want to performance tune, I would suggest rolling back, so that you can then try again on the performance tuned version to ensure you got the same results. Of course, you only want to do this on a dev server!

Now while I agree that not exists may be faster, some of the other things you want to look at are:

  • do you have cascade deletes happening? If you end up deleting many child records, that could be part of the problem.
  • Are there triggers affecting the delete? especially look to see if someone set one up to run through things row by row instead of as a set. Row by row triggers are a very bad thing when you delete many records. For instance suppose you are deleting 50K records and you have a delete trigger to an audit table. If it inserts to that table one record at a time, it is being executed 50K times. If it inserts all the deleted records in one step, that insert individually might take a bit longer but the total execution is much shorter.
  • What indexing do you have and is it helping the delete out?
  • You will want to examine the explain plan for each of your queries to see if they are improving the details of how the query will be performed.

Performance tuning is a complex thing and it is best to get read up on it in detail by reading some of the performance tuning books available for your specific database.

I might be inclined to write the query as a LEFT JOIN , although I'm guessing this would have the same performance plan as NOT EXISTS :

DELETE pt
    FROM polozkyTransakci pt LEFT JOIN
         transakce_tmp tt
         ON pt.idTransakce = tt.idTransakce AND
            tt.puvodFaktury = 'FAKTURA VO CZ'
    WHERE pt.typPolozky = 'odpocetZalohy' AND tt.idTransakce IS NULL;

I would recommend indexes, if you don't have them: polozkyTransakci(typPolozky, idTransakce) and transakce_tmp(idTransakce, puvodFaktury) . These would work on the NOT EXISTS version as well.

You can test the performance of these queries using SELECT :

    SELECT pt.*
    FROM polozkyTransakci pt LEFT JOIN
         transakce_tmp tt
         ON pt.idTransakce = tt.idTransakce AND
            tt.puvodFaktury = 'FAKTURA VO CZ'
    WHERE pt.typPolozky = 'odpocetZalohy' AND tt.idTransakce IS NULL;

The DELETE should be slower (due to the cost of logging transactions), but the performance should be comparable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM