请帮我优化这个MySQL SELECT语句

Question

I have a query that takes roughly four minutes to run on a high powered SSD server with no other notable processes running. 我有一个大约需要四分钟的查询才能在高性能SSD服务器上运行而没有其他值得注意的进程在运行。 I'd like to make it faster if possible. 如果可能的话，我想让它更快。

The database stores a match history for a popular video game called Dota 2. In this game, ten players (five on each team) each select a "hero" and battle it out. 该数据库存储了一个名为Dota 2的流行视频游戏的匹配历史。在这个游戏中，10个玩家（每个团队5个）各自选择一个“英雄”并将其战斗。

The intention of my query is to create a list of past matches along with how much of a "XP dependence" each team had, based on the heroes used. 我的查询的目的是根据所使用的英雄创建过去匹配的列表以及每个团队有多少“XP依赖”。 With 200,000 matches (and a 2,000,000 row matches-to-heroes relationship table) the query takes about four minutes. 有200,000个匹配（和2,000,000行匹配到英雄关系表），查询大约需要四分钟。 With 1,000,000 matches, it takes roughly 15. 拥有1,000,000场比赛，大约需要15场比赛。

I have full control of the server, so any configuration suggestions are also appreciated. 我完全控制了服务器，因此也欢迎任何配置建议。 Thanks for any help guys. 谢谢你的帮助。 Here are the details... 这是详细信息......

CREATE TABLE matches (
*   match_id BIGINT UNSIGNED NOT NULL,
    start_time INT UNSIGNED NOT NULL,
    skill_level TINYINT NOT NULL DEFAULT -1,
*   winning_team TINYINT UNSIGNED NOT NULL,
    PRIMARY KEY (match_id),
    KEY start_time (start_time),
    KEY skill_level (skill_level),
    KEY winning_team (winning_team));

CREATE TABLE heroes (
*   hero_id SMALLINT UNSIGNED NOT NULL,
    name CHAR(40) NOT NULL DEFAULT '',
    faction TINYINT NOT NULL DEFAULT -1,
    primary_attribute TINYINT NOT NULL DEFAULT -1,
    group_index TINYINT NOT NULL DEFAULT -1,
    match_count BIGINT UNSIGNED NOT NULL DEFAULT 0,
    win_count BIGINT UNSIGNED NOT NULL DEFAULT 0,
*   xp_from_wins BIGINT UNSIGNED NOT NULL DEFAULT 0,
*   team_xp_from_wins BIGINT UNSIGNED NOT NULL DEFAULT 0,
    xp_from_losses BIGINT UNSIGNED NOT NULL DEFAULT 0,
    team_xp_from_losses BIGINT UNSIGNED NOT NULL DEFAULT 0,
    gold_from_wins BIGINT UNSIGNED NOT NULL DEFAULT 0,
    team_gold_from_wins BIGINT UNSIGNED NOT NULL DEFAULT 0,
    gold_from_losses BIGINT UNSIGNED NOT NULL DEFAULT 0,
    team_gold_from_losses BIGINT UNSIGNED NOT NULL DEFAULT 0,
    included TINYINT UNSIGNED NOT NULL DEFAULT 0,
    PRIMARY KEY (hero_id));

CREATE TABLE matches_heroes (
*   match_id BIGINT UNSIGNED NOT NULL,
    player_id INT UNSIGNED NOT NULL,
*   hero_id SMALLINT UNSIGNED NOT NULL,
    xp_per_min SMALLINT UNSIGNED NOT NULL,
    gold_per_min SMALLINT UNSIGNED NOT NULL,
    position TINYINT UNSIGNED NOT NULL,
    PRIMARY KEY (match_id, hero_id),
    KEY match_id (match_id),
    KEY player_id (player_id),
    KEY hero_id (hero_id),
    KEY xp_per_min (xp_per_min),
    KEY gold_per_min (gold_per_min),
    KEY position (position));

Query 询问

SELECT
    matches.match_id,
    SUM(CASE     
        WHEN position < 5 THEN xp_from_wins / team_xp_from_wins     
        ELSE 0    
    END) AS radiant_xp_dependence,
    SUM(CASE     
        WHEN position >= 5 THEN xp_from_wins / team_xp_from_wins     
        ELSE 0    
    END) AS dire_xp_dependence,
    winning_team   
FROM
    matches   
INNER JOIN
    matches_heroes     
        ON matches.match_id = matches_heroes.match_id   
INNER JOIN
    heroes     
        ON matches_heroes.hero_id = heroes.hero_id   
GROUP BY
    matches.match_id

Sample Results 样本结果

match_id   | radiant_xp_dependence | dire_xp_dependence | winning_team

2298874871 | 1.0164                | 0.9689             | 1
2298884079 | 0.9932                | 1.0390             | 0
2298885606 | 0.9877                | 1.0015             | 1

EXPLAIN 说明

id | select_type | table          | type   | possible_keys            | key     | key_len | ref                            | rows | Extra

1  | SIMPLE      | heroes         | ALL    | PRIMARY                  | NULL    | NULL    | NULL                           | 111  | Using temporary; Using filesort
1  | SIMPLE      | matches_heroes | ref    | PRIMARY,match_id,hero_id | hero_id | 2       | dota_2.heroes.hero_id          | 3213 |
1  | SIMPLE      | matches        | eq_ref | PRIMARY                  | PRIMARY | 8       | dota_2.matches_heroes.match_id | 1    |

Machine Specs 机器规格

Intel Xeon E5 英特尔至强E5
E5-1630v3 4/8t E5-1630v3 4 / 8t
3.7 / 3.8 GHz 3.7 / 3.8 GHz
64 GB of RAM 64 GB的RAM
DDR4 ECC 2133 MHz DDR4 ECC 2133 MHz
2 x 480GB of SSD SOFT 2个480GB的SSD SOFT

Database 数据库

MariaDB 10.0 MariaDB 10.0
InnoDB InnoDB的

Answer 1

In all likelihood, the main performance driver is the GROUP BY . 很可能，主要的性能驱动因素是GROUP BY 。 Sometimes, in MySQL, it can be faster to use correlated subuqeries. 有时，在MySQL中，使用相关子系统会更快。 So, try writing the query like this: 所以，尝试编写这样的查询：

SELECT m.match_id,
       (SELECT SUM(h.xp_from_wins / h.team_xp_from_wins)
        FROM matches_heroes mh INNER JOIN
             heroes h   
             ON mh.hero_id = h.hero_id
        WHERE m.match_id = mh.match_id AND mh.position < 5
       ) AS radiant_xp_dependence,
       (SELECT SUM(h.xp_from_wins / h.team_xp_from_wins)
        FROM matches_heroes mh INNER JOIN
             heroes h   
             ON mh.hero_id = h.hero_id
        WHERE m.match_id = mh.match_id AND mh.position >= 5
       ) AS dire_xp_dependence,
       m.winning_team   
FROM matches m;

Then, you want indexes on: 然后，您需要索引：

matches_heroes(match_id, position)
heroes(hero_id, xp_from_wins, team_xp_from_wins)

For completeness, you might want this index as well: 为完整起见，您可能也需要此索引：

matches(match_id, winning_team)

This would be more important if you added order by match_id to the query. 如果您order by match_id向查询添加order by match_id这将更为重要。

Answer 2

As has already been mentioned in a comment; 正如评论中已经提到的那样; there is little you can do, because you select all data from the table. 你可以做的很少，因为你从表中选择了所有数据。 The query looks perfect. 查询看起来很完美。

The one idea that comes to mind are covering indexes. 想到的一个想法是覆盖索引。 With indexes containing all data needed for the query, the tables themselves don't have to be accessed anymore. 使用包含查询所需的所有数据的索引，不再需要访问表本身。

CREATE INDEX matches_quick ON matches(match_id, winning_team);

CREATE INDEX heroes_quick ON heroes(hero_id, xp_from_wins, team_xp_from_wins);

CREATE INDEX matches_heroes_quick ON matches_heroes (match_id, hero_id, position);

There is no guarantee for this to speed up your query, as you are still reading all data, so running through the indexes may be just as much work as reading the tables. 由于您仍然在读取所有数据，因此无法保证加快查询速度，因此运行索引可能与读取表格一样多。 But there is a chance that the joins will be faster and there would probably be less physical read. 但是，连接有可能更快，并且可能会有更少的物理读取。 Just give it a try. 试一试吧。

Answer 3

Waiting for another idea? 等待另一个想法？ :-) :-)

Well, there is always the data warehouse approach. 好吧，总有数据仓库的方法。 If you must run this query again and again and always for all matches ever played, then why not store the query results and access them later? 如果您必须一次又一次地运行此查询并始终对所有已播放的匹配进行操作，那么为什么不存储查询结果并在以后访问它们？

I suppose that matches played won't be altered, so you could access all results you computed, say, last week and only retrieve the additional results from the games since then from your real tables. 我认为所播放的比赛不会改变，所以你可以访问你计算的所有结果，比如上周，并且只从你真实的桌子中检索游戏的额外结果。

Create a table archived_results . 创建表archived_results 。 Add a flag archived in your matches table. 添加在matches表中archived的标记。 Then add query results to the archived_results table and set the flag to TRUE for these matches. 然后将查询结果添加到archived_results表，并将这些匹配的标志设置为TRUE。 When having to perform your query, you'd either update the archived_results table anew and only show its contents then or you'd combine archive and current: 当您必须执行查询时，您要么重新更新archived_results表，然后只显示其内容，要么将归档和当前组合：

select match_id, radiant_xp_dependence, radiant_xp_dependence winning_team
from archived_results
union all
SELECT
    matches.match_id,
    SUM(CASE     
        WHEN position < 5 THEN xp_from_wins / team_xp_from_wins     
        ELSE 0    
    END) AS radiant_xp_dependence,
...
WHERE matches.archived = FALSE
GROUP BY matches.match_id;

Answer 4

People's comments about loading whole tables into memory got me thinking. 人们关于将整个表格加载到内存中的评论让我思考。 I searched for "MySQL memory allocation" and learned how to change the buffer pool size for InnoDB tables. 我搜索了“MySQL内存分配”，并学习了如何更改InnoDB表的缓冲池大小。 The default is much smaller than my database, so I ramped it up to 8 GB using the innodb_buffer_pool_size directive in my.cnf. 默认值比我的数据库小得多，所以我使用my.cnf中的innodb_buffer_pool_size指令将其增加到8 GB。 The speed of the query increased drastically from 1308 seconds to only 114. 查询速度从1308秒急剧增加到114。

After researching more settings, my my.cnf file now looks like the following (no further speed improvements, but it should be better in other situations). 在研究了更多设置之后，我的my.cnf文件现在看起来如下（没有进一步的速度改进，但在其他情况下应该更好）。

[mysqld]
bind-address=127.0.0.1
character-set-server=utf8
collation-server=utf8_general_ci
innodb_buffer_pool_size=8G
innodb_buffer_pool_dump_at_shutdown=1
innodb_buffer_pool_load_at_startup=1
innodb_flush_log_at_trx_commit=2
innodb_log_buffer_size=8M
innodb_log_file_size=64M
innodb_read_io_threads=64
innodb_write_io_threads=64

Thanks everyone for taking the time to help out. 谢谢大家花时间帮忙。 This will be a massive improvement to my website. 这将是对我的网站的巨大改进。

请帮我优化这个MySQL SELECT语句

问题描述

4 个解决方案

解决方案1
3 已采纳 2016-04-17 11:22:25

解决方案2
2 2016-04-17 09:15:04

解决方案3
1 2016-04-17 10:41:55

解决方案4
1 2016-04-18 20:20:52

请帮我优化这个MySQL SELECT语句

问题描述

4 个解决方案

解决方案1 3 已采纳 2016-04-17 11:22:25

解决方案2 2 2016-04-17 09:15:04

解决方案3 1 2016-04-17 10:41:55

解决方案4 1 2016-04-18 20:20:52

解决方案1
3 已采纳 2016-04-17 11:22:25

解决方案2
2 2016-04-17 09:15:04

解决方案3
1 2016-04-17 10:41:55

解决方案4
1 2016-04-18 20:20:52