MySQL Query Performance - Subquery Vs Join

Question

I'm not a SQL expert, so am trying to understand why two queries have substantially different execution times.

common_stats is a large table (3 million rows). I'm just trying to understand why the huge difference in performance.

The below query takes ~15 seconds:

select distinct cs.basesalary_id
from common_stats AS cs
LEFT JOIN basesalary AS b ON b.id = cs.basesalary_id
WHERE (b.employee_id= 100 AND cs.amount > 0 AND cs.pay_id is null );

This query takes ~.1 seconds:

select distinct basesalary_id from (
    select cs.basesalary_id, cs.pay_id
    from common_stats AS cs
    LEFT JOIN basesalary AS b ON b.id = cs.basesalary_id
    WHERE (b.employee_id= 100 AND cs.amount > 0)
) as temp
where pay_id is null;

Answer 1

As a general rule:

If the subquery 'shrinks' the number of rows (say, via GROUP BY or LIMIT ), the subquery approach is better.
JOIN often creates more rows than the original tables. DISTINCT is sometime used to deflate that row count.
If the outer query has a GROUP BY , JOIN may create more rows than you realize and "inflate" any aggregates ( SUM , COUNT , etc), thereby voting for subquery instead.
More than one subquery would lead to poor optimization. (Thereby voting for JOIN .)

All of those assume optimal indexes.

Your first query would probably benefit from

INDEX(pay_id, amount, basesalary_id) -- both "covering" and optimal for `WHERE`

Check out EXPLAIN SELECT ... for both queries. Probably the faster one starts with basesalary and there is INDEX(employee_id) and that is very selective.

I may have more comments after seeing SHOW CREATE TABLE .

MySQL Query Performance - Subquery Vs Join

Question

1 answers

solution1
0 2015-10-10 18:23:36

MySQL Query Performance - Subquery Vs Join

Question

1 answers

solution1 0 2015-10-10 18:23:36

solution1
0 2015-10-10 18:23:36