子查询优化实例案例

Question

I need advises and want to share my experience about Query Optimization. 我需要建议，并希望分享我在查询优化方面的经验。 This week, I found myself stuck in an interesting dilemma. 本周，我发现自己陷入了一个有趣的困境。 I'm a novice person in mySql (2 years theory, less than one practical) 我是mySql的新手（理论为2年，实践经验不足）

Environment : 环境：

I have a table that contains articles with a column 'type', and another table article_version that contain a date where an article is added in the DB, and a third table that contains all the article types along with types label and stuffs... 我有一个表，其中包含带有“类型”列的文章，另一个表article_version包含在数据库中添加文章的日期，还有一个表，其中包含所有文章类型以及类型标签和填充物。

The 2 first tables are huge (800000+ fields and growing daily), the 3rd one is naturally small sized. 前两张桌子非常大（800000多个字段，并且每天都在增长），第三张桌子自然很小。 The article tables have a lot of column, but we will only need 'ID' and 'type' in articles and 'dateAdded' in article_version to simplify things... article表中有很多列，但是我们只需要文章中的“ ID”和“ type”以及article_version中的“ dateAdded”就可以简化事情。

What I want to do : 我想做的事：

A Query that, for a specified 'dateAdded', returns the number of articles for each types (there is ~ 50 types to scan). 对于指定的“ dateAdded”，该查询返回每种类型的文章数（有约50种要扫描的类型）。 What was already in place is 50 separate count, one for each document types oO ( not efficient, long(~ 5sec in general), ). 已经存在的是50个独立的计数，每种文档类型为oO（效率不高，很长（通常〜5秒））。

I wanted to do it all in one query and I came up with that : 我想在一个查询中完成所有操作，然后想到了：

SELECT type,
  (SELECT COUNT(DISTINCT articles.ID)
    FROM articles
      INNER JOIN article_version
        ON article_version.ARTI_ID = legi_arti.ID 
    WHERE type = td.NEW_ID
      AND dateAdded = '2009-01-01 00:00:00')  AS nbrArti 
FROM type_document td 
WHERE td.NEW_ID != '' 
GROUP BY td.NEW_ID;

The external select (type_document) allow me to get the 55 types of documents I need. 外部选择（type_document）允许我获取所需的55种文档类型。 The sub-Query is counting the articles for each type_document for the given date '2009-01-01'. 子查询正在计算给定日期'2009-01-01'中每个type_document的文章。

A common result is like : 常见的结果是：

*   type   *  nbrArti   *
*************************
* 123456   * 23         *
* 789456   * 5          *
* 16578    * 98         *
* ....     * ....       *
* ....     * ....       *
*************************

This query get the job done, but the join in the sub-query is making this extremely slow, The reason, if I'm right, is that a join is made by the server for each types, so 50+ times, this solution is even more slower than doing the 50 queries independently for each types, awesome :/ 该查询完成了工作，但是子查询中的联接使此过程非常缓慢，如果我是对的，原因是服务器为每种类型进行了联接，因此该解决方案超过了50次比每种类型分别进行50个查询要慢得多，真棒：/

A Solution 一个解法

I came up with a solution myself that drastically improve the performance with the same result, I just created a view corresponding to the subQuery, making the join on ids for each types... And Boom, it's fast 我自己想出了一个解决方案，可以以相同的结果极大地提高性能，我只是创建了一个与subQuery相对应的视图，对每种类型的id进行了连接……而且Boom很快

I think, correct me if I'm wrong, that the reason is the server only runs the JOIN statement once. 我认为，如果我错了，请纠正我，原因是服务器仅运行一次JOIN语句。

This solution is ~5 time faster than the solution that was already there, and ~20 times faster than my first attempt. 该解决方案比现有解决方案快约5倍，比我的第一次尝试快约20倍。 Sweet 甜

Questions / thoughts 问题/想法

With yet another view, I'll now need to check if I don't loose more than win when documents get inserted... 从另一个角度来看，现在我需要检查插入文档时是否没有失去胜利的余地...
Is there a way to improve the original Query, by getting the JOIN statement out of the sub-query? 是否可以通过从子查询中删除JOIN语句来改进原始查询？ (And getting rid of the view) （摆脱视图）
Any other tips/thoughts? 还有其他提示/想法吗？ (In Server Optimizing for example...) （例如，在服务器优化中...）

Apologies for my approximating English, it'is not my primary language. 抱歉，我不是英语，这不是我的主要语言。

Answer 1

You cannot create a single index on (type, date_added) , because these fields are in different tables. 您不能在(type, date_added)上创建单个索引，因为这些字段位于不同的表中。

Without the view, the subquery most probably selects article as a leading table and the index on type which is not very selective. 如果没有该视图，则子查询很可能会选择article作为主导表，并且选择type的索引不是很严格。

By creating the view, you force the subquery to calculate the sums for all types first (using a selective the index on date ) and then use a JOIN BUFFER (which is fast enough for only 55 types). 通过创建视图，您可以强制子查询首先计算所有类型的总和（使用date上的选择性索引），然后使用JOIN BUFFER （仅对55种类型足够快）。

You can achieve similar results by rewriting your query as this: 您可以通过如下重写查询来获得类似的结果：

SELECT  new_id, COALESCE(cnt, 0) AS cnt
FROM    type_document td
LEFT JOIN
        (
        SELECT  type, COUNT(DISTINCT article_id) AS cnt
        FROM    article_versions av
        JOIN    articles a
        ON      a.id = av.article_id
        WHERE   av.date = '2009-01-01 00:00:00'
        GROUP BY
                type
        ) q
ON      q.type = td.new_id

Unfortunately, MySQL is not able to do table spools or hash joins, so to improve the performance you'll need to denormalize your tables: add type to article_version and create a composite index on (date, type) . 不幸的是， MySQL无法执行表假脱机或哈希联接，因此要提高性能，您需要对表进行非规范化：将type添加到article_version并在(date, type)上创建一个复合索引。

子查询优化实例案例

问题描述

1 个解决方案

解决方案1
1 已采纳 2009-11-27 16:55:34

子查询优化实例案例

问题描述

1 个解决方案

解决方案1 1 已采纳 2009-11-27 16:55:34

解决方案1
1 已采纳 2009-11-27 16:55:34