I'm not a SQL expert, so am trying to understand why two queries have substantially different execution times.
common_stats is a large table (3 million rows). I'm just trying to understand why the huge difference in performance.
The below query takes ~15 seconds:
select distinct cs.basesalary_id
from common_stats AS cs
LEFT JOIN basesalary AS b ON b.id = cs.basesalary_id
WHERE (b.employee_id= 100 AND cs.amount > 0 AND cs.pay_id is null );
This query takes ~.1 seconds:
select distinct basesalary_id from (
select cs.basesalary_id, cs.pay_id
from common_stats AS cs
LEFT JOIN basesalary AS b ON b.id = cs.basesalary_id
WHERE (b.employee_id= 100 AND cs.amount > 0)
) as temp
where pay_id is null;
As a general rule:
GROUP BY
or LIMIT
), the subquery approach is better. JOIN
often creates more rows than the original tables. DISTINCT
is sometime used to deflate that row count. GROUP BY
, JOIN
may create more rows than you realize and "inflate" any aggregates ( SUM
, COUNT
, etc), thereby voting for subquery instead. JOIN
.) All of those assume optimal indexes.
Your first query would probably benefit from
INDEX(pay_id, amount, basesalary_id) -- both "covering" and optimal for `WHERE`
Check out EXPLAIN SELECT ...
for both queries. Probably the faster one starts with basesalary
and there is INDEX(employee_id)
and that is very selective.
I may have more comments after seeing SHOW CREATE TABLE
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.