简体   繁体   中英

MySQL Query Performance - Subquery Vs Join

I'm not a SQL expert, so am trying to understand why two queries have substantially different execution times.

common_stats is a large table (3 million rows). I'm just trying to understand why the huge difference in performance.

The below query takes ~15 seconds:

select distinct cs.basesalary_id
from common_stats AS cs
LEFT JOIN basesalary AS b ON b.id = cs.basesalary_id
WHERE (b.employee_id= 100 AND cs.amount > 0 AND cs.pay_id is null );

This query takes ~.1 seconds:

select distinct basesalary_id from (
    select cs.basesalary_id, cs.pay_id
    from common_stats AS cs
    LEFT JOIN basesalary AS b ON b.id = cs.basesalary_id
    WHERE (b.employee_id= 100 AND cs.amount > 0)
) as temp
where pay_id is null;

As a general rule:

  • If the subquery 'shrinks' the number of rows (say, via GROUP BY or LIMIT ), the subquery approach is better.
  • JOIN often creates more rows than the original tables. DISTINCT is sometime used to deflate that row count.
  • If the outer query has a GROUP BY , JOIN may create more rows than you realize and "inflate" any aggregates ( SUM , COUNT , etc), thereby voting for subquery instead.
  • More than one subquery would lead to poor optimization. (Thereby voting for JOIN .)

All of those assume optimal indexes.

Your first query would probably benefit from

INDEX(pay_id, amount, basesalary_id) -- both "covering" and optimal for `WHERE`

Check out EXPLAIN SELECT ... for both queries. Probably the faster one starts with basesalary and there is INDEX(employee_id) and that is very selective.

I may have more comments after seeing SHOW CREATE TABLE .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM