优化MySQL查询大约需要20秒钟！

Question

I'm running the following query on a Macbook Pro 2.53ghz with 4GB of Ram: 我在具有4GB Ram的Macbook Pro 2.53ghz上运行以下查询：

SELECT
    c.id            AS id,
    c.name          AS name,
    c.parent_id     AS parent_id,
    s.domain        AS domain_name,
    s.domain_id     AS domain_id,
    NULL            AS stats
FROM
    stats s
LEFT JOIN stats_id_category sic ON s.id = sic.stats_id
LEFT JOIN categories c ON c.id = sic.category_id
GROUP BY
    c.name

It takes about 17 seconds to complete. 完成大约需要17秒。

EXPLAIN: 说明：

alt text http://img7.imageshack.us/img7/1364/picture1va.png 替代文字http://img7.imageshack.us/img7/1364/picture1va.png

The tables: 表格：

Information: 信息：

Number of rows: 147397
Data size: 20.3MB
Index size: 1.4MB

Table: 表：

CREATE TABLE `stats` (
    `id` int(11) unsigned NOT NULL auto_increment,
    `time` int(11) NOT NULL,
    `domain` varchar(40) NOT NULL,
    `ip` varchar(20) NOT NULL,
    `user_agent` varchar(255) NOT NULL,
    `domain_id` int(11) NOT NULL,
    `date` timestamp NOT NULL default CURRENT_TIMESTAMP,
    `referrer` varchar(400) default NULL,
    KEY `id` (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=147398 DEFAULT CHARSET=utf8

Information second table: 信息第二表：

Number of rows: 1285093
Data size: 11MB
Index size: 17.5MB

Second table: 第二张表：

CREATE TABLE `stats_id_category` (
    `stats_id` int(11) NOT NULL,
    `category_id` int(11) NOT NULL,
    KEY `stats_id` (`stats_id`,`category_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

Information third table: 信息第三表：

Number of rows: 161
Data size: 3.9KB
Index size: 8KB

Third table: 第三表：

CREATE TABLE `categories` (
    `id` int(11) NOT NULL auto_increment,
    `parent_id` int(11) default NULL,
    `name` varchar(40) NOT NULL,
    `questions_category_id` int(11) NOT NULL default '0',
    `rank` int(2) NOT NULL default '0',
    PRIMARY KEY  (`id`),    
    KEY `id` (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=205 DEFAULT CHARSET=latin1

Hopefully someone can help me speed this up. 希望有人可以帮助我加快速度。

Answer 1

I see several WTF's in your query: 我在您的查询中看到几个WTF：

You use two LEFT OUTER JOIN s but then you group by the c.name column which might have no matches. 您使用了两个LEFT OUTER JOIN但随后将c.name列分组（可能没有匹配项）。 So perhaps you don't really need an outer join? 所以也许您真的不需要外部联接吗？ If that's the case, you should use an inner join, because outer joins are often slower. 如果是这种情况，则应使用内部联接，因为外部联接通常较慢。
You are grouping by c.name but this gives ambiguous results for every other column in your select-list. 您正在按c.name分组，但是这会给选择列表中的所有其他列带来不明确的结果。 Ie there might be multiple values in these columns in each grouping by c.name . 也就是说，在按c.name分组的每个列中，这些列中可能有多个值。 You're lucky you're using MySQL, because this query would simply give an error in any other RDBMS. 幸运的是，您正在使用MySQL，因为此查询只会在任何其他RDBMS中给出错误。
This is a performance issue because the GROUP BY is likely causing the " using temporary; using filesort " you see in the EXPLAIN. 这是一个性能问题，因为GROUP BY可能会导致您在EXPLAIN中看到“ using temporary; using filesort ”的问题。 This is a notorious performance-killer, and it's probably the single biggest reason this query is taking 17 seconds. 这是一个臭名昭著的性能杀手，这可能是此查询花费17秒的唯一最大原因。 Since it's not clear why you're using GROUP BY at all (using no aggregate functions, and violating the Single-Value Rule), it seems like you need to rethink this. 由于不清楚为什么要使用GROUP BY （不使用聚合函数，并且违反了单值规则），因此似乎需要重新考虑这一点。
You are grouping by c.name which doesn't have a UNIQUE constraint on it. 您正在按没有UNIQUE约束的c.name分组。 You could in theory have multiple categories with the same name, and these would be lumped together in a group. 从理论上讲，您可以有多个具有相同名称的类别，并且这些类别将组合在一起。 I wonder why you don't group by c.id if you want one group per category. 我想知道如果您希望每个类别一个组，为什么不按c.id分组。
SELECT NULL AS stats : I don't understand why you need this. SELECT NULL AS stats ：我不明白您为什么需要这个。 It's kind of like creating a variable that you never use. 这有点像创建一个您永远不会使用的变量。 It shouldn't harm performance, but it's just another WTF that makes me think you haven't thought this query through very well. 它不应该损害性能，但这只是另一个WTF，这使我认为您对这个查询的了解还不是很好。
You say in a comment you're looking for number of visitors per category. 您在评论中说，您正在寻找每个类别的访客数量。 But your query doesn't have any aggregate functions like SUM() or COUNT() . 但是您的查询没有任何聚合函数，例如SUM()或COUNT() 。 And your select-list includes s.domain and s.domain_id which would be different for every visitor, right? 并且您的选择列表包含s.domain和s.domain_id ，这对于每个访问者而言都是不同的，对吗？ So what value do you expect to be in the result set if you only have one row per category? 因此，如果每个类别只有一行，那么您期望在结果集中有什么价值？ This isn't really a performance issue either, it just means your query results don't tell you anything useful. 这也不是真正的性能问题，只是意味着您的查询结果不会告诉您任何有用的信息。
Your stats_id_category table has an index over its two columns, but no primary key. 您的stats_id_category表在其两列上都有一个索引，但是没有主键。 So you can easily get duplicate rows, and this means your count of visitors may be inaccurate. 因此，您可以轻松获得重复的行，这意味着您的访客数量可能不准确。 You need to drop that redundant index and use a primary key instead. 您需要删除该冗余索引，而改用主键。 I'd order category_id first in that primary key, so the join can take advantage of the index. 我会先在该主键中订购category_id ，以便联接可以利用索引。
```
 ALTER TABLE stats_id_category DROP KEY stats_id, ADD PRIMARY KEY (category_id, stats_id); 
```

Now you can eliminate one of your joins, if all you need to count is the number of visitors: 现在，如果您只需要计算访问者人数，就可以消除其中一个联接：

SELECT c.id, c.name, c.parent_id, COUNT(*) AS num_visitors
FROM categories c
INNER JOIN stats_id_category sic ON (sic.category_id = c.id)
GROUP BY c.id;

Now the query doesn't need to read the stats table at all, or even the stats_id_category table . 现在，查询根本不需要读取stats表，甚至不需要读取stats_id_category table 。 It can get its count simply by reading the index of the stats_id_category table, which should eliminate a lot of work. 只需通过读取stats_id_category表的索引即可获得其计数，这将减少很多工作。

Answer 2

You are missing the third table in the information provided (categories). 您缺少所提供的信息（类别）中的第三张表。

Also, it seems odd that you are doing a LEFT JOIN and then using the right table (which might be all NULLS) in the GROUP BY. 另外，您正在做一个LEFT JOIN，然后在GROUP BY中使用正确的表（可能是全NULL），这似乎很奇怪。 You will end up grouping all of the non-matching rows together as a result, is that what you intended? 结果，您最终会将所有不匹配的行分组在一起，这就是您想要的吗？

Finally, can you provide an EXPLAIN for the SELECT? 最后，您能否为SELECT提供解释？

Answer 3

Harrison is right; 哈里森是对的。 we need the other table. 我们需要另一张桌子。 I would start by adding an index on category_id to stats_id_category, though. 不过，我将从在category_id上添加索引到stats_id_category开始。

Answer 4

I agree with Bill. 我同意比尔。 Point 2 is very important. 第二点非常重要。 The query doesn't even make logical sense. 该查询甚至没有逻辑意义。 Also, with the simple fact that there is no where statement means that you have to pull back every row in the stats table, which seems to be around 140000. It then has to sort all that data, so that it can perform the GROUP BY. 同样，由于没有where语句这一简单事实，意味着您必须拉回stats表中的每一行，这似乎在140000左右。然后必须对所有数据进行排序，以便可以执行GROUP BY 。 This is because sort [ O(n log n)] and then find duplicates [ O(n) ] is much faster than just finding duplicates without sorting the data set [ O(n^2)?? 这是因为排序[O（n log n）]然后查找重复项[O（n）]比不对数据集进行排序[O（n ^ 2）？ ]. ]。

优化MySQL查询大约需要20秒钟！

问题描述

4 个解决方案

解决方案1
3 2009-09-05 18:17:46

解决方案2
0 2009-09-05 05:33:55

解决方案3
0 2009-09-05 06:00:49

解决方案4
0 2009-09-05 18:25:18

优化MySQL查询大约需要20秒钟！

问题描述

4 个解决方案

解决方案1 3 2009-09-05 18:17:46

解决方案2 0 2009-09-05 05:33:55

解决方案3 0 2009-09-05 06:00:49

解决方案4 0 2009-09-05 18:25:18

解决方案1
3 2009-09-05 18:17:46

解决方案2
0 2009-09-05 05:33:55

解决方案3
0 2009-09-05 06:00:49

解决方案4
0 2009-09-05 18:25:18