MySQL query needs to be more efficient

Question

INSERT INTO payments (invoice_id)
SELECT id
FROM   invoice
WHERE  NOT EXISTS
  (SELECT invoice_id
   FROM   payments
   WHERE  payments.invoice_id = invoice.id)

This is taking about 35 seconds. There are times in production that a payment entry isn't created at the time an invoice is created. I need to manually create payment rows with only the invoice_id of invoices which no payment records exist.

Any help would be greatly appreciated.

Answer 1

How big (rows) are your invoice and payments tables? You might can use partition on any of these table or optimize them other way. Also I would recommend you to look through query plan (in your IDE) and check what costs a lot.

Answer 2

My bet is that it's taking a long time because of the correlated subquery (in the EXISTS predicate) is being run for every row in the invoice table, and I suspect that an appropriate index isn't available.

But before we jump on the knee jerk "add an index" bandwagon...

First, run an EXPLAIN on the SELECT . EXPLAIN SELECT ... and grab the output from that; that will show the execution plan for the query.

(We're strongly suspicious that it's the SELECT that is slow, and that's it's not really the actual INSERT with performance bogged down by horrendous INSERT triggers and such.)

I suggest re-writing the query to use an anti-join pattern. (This isn't a panacea, but sometimes we can get much better performance, and that mostly depends on having suitable indexes available.)

 SELECT i.id
   FROM invoice i
   LEFT
   JOIN payments p
     ON p.invoice_id = i.id
  WHERE p.invoice_id IS NULL

That's going to return all rows from invoice , along with any matching rows from payments . The LEFT keyword makes that an outer join; that means that the query will also return rows from invoice that don't have a matching row in payments .

The "trick" is the predicate in the WHERE clause. By specifying that we only return rows where the invoice_id from payments is NULL, we filter out all invoice that had a matching row in payments .

We can run an EXPLAIN SELECT ... with that query. At a minimum, we're expecting to see the query making effective use of an index with leading column of invoice_id , and "Using index" in the Extra column.

If memory serves me, I think this can't be directly used in an INSERT INTO payments , because the query references the same table. The workaround is to reference this query as an inline view...

SELECT s.id
  FROM ( SELECT i.id
           FROM invoice i
           LEFT
           JOIN payments p
             ON p.invoice_id = i.id
          WHERE p.invoice_id IS NULL
       ) s

That does add some overhead, materializing the derived table. But it shouldn't be too bad for a relatively small set.

That query can serve as a rowsource for an INSERT INTO payments (invoice_id)

Without seeing the EXPLAIN output and the table definitions (including indexes), we're really just guessing what MySQL is doing. We really want to see how the SELECT performs, the INSERT can't run any faster than the SELECT does.

MySQL query needs to be more efficient

Question

2 answers

solution1
0 2015-04-30 19:29:33

solution2
0 2015-04-30 23:48:45

MySQL query needs to be more efficient

Question

2 answers

solution1 0 2015-04-30 19:29:33

solution2 0 2015-04-30 23:48:45

solution1
0 2015-04-30 19:29:33

solution2
0 2015-04-30 23:48:45