[英]Sub-query Optimization Talk with an example case
I need advises and want to share my experience about Query Optimization. 我需要建议,并希望分享我在查询优化方面的经验。 This week, I found myself stuck in an interesting dilemma.
本周,我发现自己陷入了一个有趣的困境。 I'm a novice person in mySql (2 years theory, less than one practical)
我是mySql的新手(理论为2年,实践经验不足)
Environment : 环境 :
I have a table that contains articles with a column 'type', and another table article_version that contain a date where an article is added in the DB, and a third table that contains all the article types along with types label and stuffs... 我有一个表,其中包含带有“类型”列的文章,另一个表article_version包含在数据库中添加文章的日期,还有一个表,其中包含所有文章类型以及类型标签和填充物。
The 2 first tables are huge (800000+ fields and growing daily), the 3rd one is naturally small sized. 前两张桌子非常大(800000多个字段,并且每天都在增长),第三张桌子自然很小。 The article tables have a lot of column, but we will only need 'ID' and 'type' in articles and 'dateAdded' in article_version to simplify things...
article表中有很多列,但是我们只需要文章中的“ ID”和“ type”以及article_version中的“ dateAdded”就可以简化事情。
What I want to do : 我想做的事 :
A Query that, for a specified 'dateAdded', returns the number of articles for each types (there is ~ 50 types to scan). 对于指定的“ dateAdded”,该查询返回每种类型的文章数(有约50种要扫描的类型)。 What was already in place is 50 separate count, one for each document types oO ( not efficient, long(~ 5sec in general), ).
已经存在的是50个独立的计数,每种文档类型为oO(效率不高,很长(通常〜5秒))。
I wanted to do it all in one query and I came up with that : 我想在一个查询中完成所有操作,然后想到了:
SELECT type,
(SELECT COUNT(DISTINCT articles.ID)
FROM articles
INNER JOIN article_version
ON article_version.ARTI_ID = legi_arti.ID
WHERE type = td.NEW_ID
AND dateAdded = '2009-01-01 00:00:00') AS nbrArti
FROM type_document td
WHERE td.NEW_ID != ''
GROUP BY td.NEW_ID;
The external select (type_document) allow me to get the 55 types of documents I need. 外部选择(type_document)允许我获取所需的55种文档类型。 The sub-Query is counting the articles for each type_document for the given date '2009-01-01'.
子查询正在计算给定日期'2009-01-01'中每个type_document的文章。
A common result is like : 常见的结果是:
* type * nbrArti * ************************* * 123456 * 23 * * 789456 * 5 * * 16578 * 98 * * .... * .... * * .... * .... * *************************
This query get the job done, but the join in the sub-query is making this extremely slow, The reason, if I'm right, is that a join is made by the server for each types, so 50+ times, this solution is even more slower than doing the 50 queries independently for each types, awesome :/ 该查询完成了工作,但是子查询中的联接使此过程非常缓慢,如果我是对的,原因是服务器为每种类型进行了联接,因此该解决方案超过了50次比每种类型分别进行50个查询要慢得多,真棒:/
A Solution 一个解法
I came up with a solution myself that drastically improve the performance with the same result, I just created a view corresponding to the subQuery, making the join on ids for each types... And Boom, it's fast 我自己想出了一个解决方案,可以以相同的结果极大地提高性能,我只是创建了一个与subQuery相对应的视图,对每种类型的id进行了连接……而且Boom很快
I think, correct me if I'm wrong, that the reason is the server only runs the JOIN statement once. 我认为,如果我错了,请纠正我,原因是服务器仅运行一次JOIN语句。
This solution is ~5 time faster than the solution that was already there, and ~20 times faster than my first attempt. 该解决方案比现有解决方案快约5倍,比我的第一次尝试快约20倍。 Sweet
甜
Questions / thoughts 问题/想法
Apologies for my approximating English, it'is not my primary language. 抱歉,我不是英语,这不是我的主要语言。
You cannot create a single index on (type, date_added)
, because these fields are in different tables. 您不能在
(type, date_added)
上创建单个索引,因为这些字段位于不同的表中。
Without the view, the subquery most probably selects article
as a leading table and the index on type
which is not very selective. 如果没有该视图,则子查询很可能会选择
article
作为主导表,并且选择type
的索引不是很严格。
By creating the view, you force the subquery to calculate the sums for all types first (using a selective the index on date
) and then use a JOIN BUFFER
(which is fast enough for only 55
types). 通过创建视图,您可以强制子查询首先计算所有类型的总和(使用
date
上的选择性索引),然后使用JOIN BUFFER
(仅对55
种类型足够快)。
You can achieve similar results by rewriting your query as this: 您可以通过如下重写查询来获得类似的结果:
SELECT new_id, COALESCE(cnt, 0) AS cnt
FROM type_document td
LEFT JOIN
(
SELECT type, COUNT(DISTINCT article_id) AS cnt
FROM article_versions av
JOIN articles a
ON a.id = av.article_id
WHERE av.date = '2009-01-01 00:00:00'
GROUP BY
type
) q
ON q.type = td.new_id
Unfortunately, MySQL
is not able to do table spools or hash joins, so to improve the performance you'll need to denormalize your tables: add type
to article_version
and create a composite index on (date, type)
. 不幸的是,
MySQL
无法执行表假脱机或哈希联接,因此要提高性能,您需要对表进行非规范化:将type
添加到article_version
并在(date, type)
上创建一个复合索引。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.