简体   繁体   English

为什么MySQL JOIN明显快于WHERE IN(子查询)

[英]why is MySQL JOIN significantly faster than WHERE IN (subquery)

I am trying to better understand why this query optimization is so significant (over 100 times faster) so I can reuse similar logic for other queries. 我试图更好地理解为什么这个查询优化是如此重要(超过100倍),所以我可以重用其他查询类似的逻辑。

Using MySQL 4.1 - RESET QUERY CACHE and FLUSH TABLES was done before all queries and result time can be reproduced consistently. 使用MySQL 4.1 - 在所有查询和结果时间可以一致地再现之前,重置QUERY CACHE和FLUSH TABLES。 Only thing that is obvious to me on the EXPLAIN is that only 5 rows have to be found during the JOIN ? 在EXPLAIN上,对我来说唯一明显的事情是在JOIN期间只需要找到5行? But is that the whole answer to the speed? 但这是速度的全部答案吗? Both queries are using a partial index (forum_stickies) to determine deleted topics status (topic_status=0) 两个查询都使用部分索引(forum_stickies)来确定已删除的主题状态(topic_status = 0)

Screenshots for deeper analysis with EXPLAIN 使用EXPLAIN进行更深入分析的屏幕截图

slow query: 0.7+ seconds (cache cleared) 慢查询:0.7+秒(缓存清除)

SELECT SQL_NO_CACHE forum_id, topic_id FROM bb_topics 
WHERE topic_last_post_id IN 
(SELECT SQL_NO_CACHE  MAX (topic_last_post_id) AS topic_last_post_id
FROM bb_topics WHERE topic_status=0 GROUP BY forum_id)

fast query: 0.004 seconds or less (cache cleared) 快速查询:0.004秒或更短(缓存清除)

SELECT SQL_NO_CACHE forum_id, topic_id FROM bb_topics AS s1 
JOIN 
(SELECT SQL_NO_CACHE MAX(topic_last_post_id) AS topic_last_post_id
FROM bb_topics WHERE topic_status=0 GROUP BY forum_id) AS s2 
ON s1.topic_last_post_id=s2.topic_last_post_id  

Note there is no index on the most important column ( topic_last_post_id ) but that cannot be helped (results are stored for repeated use anyway). 请注意,最重要的列( topic_last_post_id )上没有索引但无法帮助(结果存储以供重复使用)。

Is the answer simply because the first query has to scan topic_last_post_id TWICE, the second time to match up the results to the subquery? 答案只是因为第一个查询必须扫描topic_last_post_id TWICE,第二次将结果与子查询匹配? If so, why is it exponentially slower? 如果是这样,为什么它会指数速度变慢?

(less important I am curious why the first query still takes so long if I actually do put an index on topic_last_post_id ) (不太重要我很好奇为什么第一个查询仍然需要这么长时间,如果我确实在topic_last_post_id上放了一个索引)

update: I found this thread on stackoverflow after much searching later on which goes into this topic Subqueries vs joins 更新:我在稍后进行了大量搜索之后在stackoverflow上发现了这个线程,其中涉及主题与联接的主题

Maybe the engine executes the subquery for every row in bb_topics, just to see if it finds the topic_last_post_id in the results. 也许引擎会为bb_topics中的每一行执行子查询,只是为了查看它是否在结果中找到了topic_last_post_id。 Would be stupid, but would also explain the huge difference. 会是愚蠢的,但也会解释巨大的差异。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM