简体   繁体   中英

How can I optimize the following SQL query

Right now it is taking a long long time to run.

The query is:

select count(id), variety_id, name 
from tblItem 
where order_id IN (
    select order_id 
    from tblItem 
    where variety_id=4005 
    order by order_id DESC) 
AND variety_id != 4005 
GROUP BY variety_id 
order by count(id) DESC
LIMIT 5;

I have indexes on variety_id and order_id. I'm basically trying to build a recommendation engine. The query is looking for the top 5 items people buy when they also bought variety_id 4005. But like i said it takes way to long to run.

Does anyone have a way to optimize this query?

Try this:

select count(t1.id), t1.variety_id, t1.name 
from tblItem t1
inner join tblItem t2 ON t2.order_id = t1.order_id and t2.variety_id = 4005
where t1.variety_id != 4005 
GROUP BY t1.variety_id, t1.name
ORDER BY count(t1.id) DESC 
LIMIT 5;

I've often found that MySQL optimizes WHERE ... IN (SELECT ...) poorly, and JOIN works better; I've read that recent MySQL versions are better, so it may be version-dependent. Also, you should use COUNT(*) unless the column can be NULL and you need to ignore the null values in the count.

SELECT COUNT(*) count, variety_id, name
FROM tblItem AS t1
JOIN (SELECT DISTINCT order_id
      FROM tblItem
      WHERE variety_id = 4005) AS t2
ON t1.order_id = t2.order_id
WHERE t1.variety_id != 4005
GROUP BY variety_id
ORDER BY count DESC
LIMIT 5

The subquery with DISTINCT is needed to prevent multiplying the counts by the number of matching rows in the cross-product.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM