简体   繁体   English

MySQL查询速度很慢

[英]MySQL very slow query

My table has the following columns: 我的表格包含以下列:

gamelogs_id (auto_increment primary key)
player_id (int)
player_name (varchar)
game_id (int)
season_id (int)
points (int)

The table has the following indexes 该表具有以下索引

+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table           | Non_unique | Key_name           | Seq_in_index | Column_name        | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| player_gamelogs |          0 | PRIMARY            |            1 | player_gamelogs_id | A         |      371330 |     NULL | NULL   |      | BTREE      |         |               |
| player_gamelogs |          1 | player_name        |            1 | player_name        | A         |        3375 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | points          |            1 | points          | A         |         506 |     NULL | NULL   | YES  | BTREE      |         ## Heading ##|               |
| player_gamelogs |          1 | game_id            |            1 | game_id            | A         |       37133 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | season             |            1 | season             | A         |          30 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | team_abbreviation  |            1 | team_abbreviation  | A         |          70 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | player_id          |            1 | game_id            | A         |       41258 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | player_id          |            2 | player_id          | A         |      371330 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | player_id          |            3 | dk_points          | A         |      371330 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | game_player_season |            1 | game_id            | A         |       41258 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | game_player_season |            2 | player_id          | A         |      371330 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | game_player_season |            3 | season_id          | A         |      371330 |     NULL | NULL   |      | BTREE      |         |               |
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

I am trying to calculate the mean of points for a season and player prior to the game being played. 我试图在比赛开始之前计算一个赛季和球员的积分平均值。 So for the 3rd game of the season, avg_points would be the mean of games 1 and 2. The game numbers are in sequential order such that an earlier game is less than a later game. 因此,对于本赛季的第3场比赛,avg_points将是游戏1和2的平均值。游戏数量按顺序排列,使得较早的游戏比较晚的游戏少。 I also have the option to use a date field but I figured that numeric comparison would be faster? 我也可以选择使用日期字段,但我认为数字比较会更快?

My query is as follows: 我的查询如下:

SELECT game_id, 
       player_id, 
       player_name, 
       (SELECT avg(points) 
          FROM player_gamelogs t2
         WHERE t2.game_id < t1.game_id 
           AND t1.player_id = t2.player_id 
           AND t1.season_id = t2.season_id) AS avg_points
  FROM player_gamelogs t1
 ORDER BY player_name, game_id;

EXPLAIN produces the following output: EXPLAIN生成以下输出:

| id | select_type        | table | type | possible_keys                        | key  | key_len | ref  | rows   | Extra                                           |
+----+--------------------+-------+------+--------------------------------------+------+---------+------+--------+-------------------------------------------------+
|  1 | PRIMARY            | t1    | ALL  | NULL                                 | NULL | NULL    | NULL | 371330 | Using filesort                                  |
|  2 | DEPENDENT SUBQUERY | t2    | ALL  | game_id,player_id,game_player_season | NULL | NULL    | NULL | 371330 | Range checked for each record (index map: 0xC8) |

I am not sure if it is because of the nature of the task involved or because of an inefficiency in my query. 我不确定这是因为涉及的任务的性质还是因为我的查询效率低下。 Thanks for any suggestions! 谢谢你的任何建议!

Please consider this query: 请考虑以下查询:

SELECT t1.season_id, t1.game_id, t1.player_id, t1.player_name, AVG(COALESCE(t2.points, 0)) AS average_player_points
FROM player_gamelogs t1
        LEFT JOIN player_gamelogs t2 ON 
                t1.game_id > t2.game_id 
            AND t1.player_id = t2.player_id
            AND t1.season_id = t2.season_id 
GROUP BY
    t1.season_id, t1.game_id, t1.player_id, t1.player_name
ORDER BY t1.player_name, t1.game_id;

Notes: 笔记:

  • To perform optimally, you'd need an additional index on (season_id, game_id, player_id, player_name) 要以最佳方式执行,您需要一个额外的索引(season_id,game_id,player_id,player_name)
  • Even better, would be to have player table where to retrieve the name from the id. 更好的是,将播放器表从id中检索名称。 It seems redundant to me that we have to grab the player name from a log table, moreover if it's required in an index. 对我来说,我们必须从日志表中获取播放器名称,而且如果它在索引中是必需的,这似乎是多余的。
  • Group by already sorts by grouped columns. Group by已分组的列进行分组。 If you can, avoid ordering afterwards as it generates useless overhead. 如果可以,请避免事后订购,因为它会产生无用的开销。 As outlined in the comments, this is not an official behavior and the outcome of assuming its consistency over time should be pondered vs the risk of suddenly losing sorting. 正如评论中所述,这不是一种官方行为,并且假设其随时间的一致性的结果应该考虑与突然失去分类的风险。

Your query is fine as written: 你的查询写得很好:

SELECT game_id, player_id, player_name, 
       (SELECT avg(t2.points) 
        FROM player_gamelogs t2
        WHERE t2.game_id < t1.game_id AND
              t1.player_id = t2.player_id AND
              t1.season_id = t2.season_id
      ) AS avg_points
FROM player_gamelogs t1
ORDER BY player_name, game_id;

But, for optimal performance you want two composite indexes on it: (player_id, season_id, game_id, points) and (player_name, game_id, season_id) . 但是,为了获得最佳性能,您需要两个复合索引: (player_id, season_id, game_id, points)(player_name, game_id, season_id)

The first index should speed the subquery. 第一个索引应该加速子查询。 The second is for the outer order by . 第二个是外部order by

As you have your query now, you are running for EACH game and all the games under it for every player... So, for example, if you had 10 games per person, you are getting the following results per season/person 正如您现在的查询一样,您正在为每个玩家运行每个游戏及其下的所有游戏...例如,如果您每人有10个游戏,则每个季节/人获得以下结果

Game 10, Game 10 points, avg of games 1-9
Game 9, Game 9 points, avg of games 1-8...
...
...
Game 2, Game 2 points, avg of thus final game 1 only.

You stated you wanted the most recent game with the average of everything under it. 你声明你想要最新的游戏,其中包含一切的平均值。 That said, I am assuming you do NOT care about each of the lower game levels per person. 也就是说,我假设你并不关心每个人的每个较低的游戏关卡。

You are also doing the query covering ALL seasons. 您还在进行涵盖所有季节的查询。 If a season is finished, do you care about old seasons? 如果一个季节结束,你关心旧季节吗? or just the current season. 或者只是当前的季节。 Otherwise you are going through all seasons, all players... 否则你将经历所有赛季,所有球员......

All that said, I offer the following. 总而言之,我提供以下内容。 First, limit the query to whatever the latest season is by using the WHERE clause, but I am INTENTIONALLY leaving the season in the query / group by in case you DO want other seasons. 首先,使用WHERE子句将查询限制为最新季节,但我特意将季节留在查询/组中,以防您想要其他季节。 Then, I am getting the MAXIMUM game for a given person / season as the baseline for the final 1 row (per person season), then getting the average of everything under that. 然后,我将给定人/季的MAXIMUM游戏作为最后1行(每人季节)的基线,然后得到其下的所有内容的平均值。 So, in the scenario sample of 10 games down to 2, I won't be grabbing the underlying rows 9-2, just returning the #10 game per my scenario. 因此,在10场比赛的场景样本中,我将不会抓住9-2的基础行,只是按照我的场景返回#10游戏。

select
      pgMax.Player_ID,
      pgMax.Season_ID,
      pgMax.mostRecentGameID,
      pgl3.points as mostRecentGamePoints,
      pgl3.player_name,
      coalesce( avg( pgl2.points ), 0 ) as AvgPointsPriorToCurrentGame
   from
      ( select pgl1.player_id,
               pgl1.season_id,
               max( pgl1.game_id ) as mostRecentGameID
           from
              player_gameLogs pgl1
           where
               pgl1.season_id = JustOneSeason
           group by
              pgl1.player_id,
              pgl1.season_id ) pgMax

         JOIN player_gamelogs pgl pgl2
            on pgMax.player_id = pgl2.player_id
           AND pgMax.season_id = pgl2.season_id
           AND pgMax.mostRecentGameID > pgl2.game_id

         JOIN player_gamelogs pgl pgl3
            on pgMax.player_id = pgl3.player_id
           AND pgMax.season_id = pgl3.season_id
           AND pgMax.mostRecentGameID = pgl3.game_id
   group by
      pgMax.Player_ID,
      pgMax.Season_ID
   order by
      pgMax.Player_ID

Now, for optimizing the query, a composite index would be best on (player_id, season_id, game_id, points). 现在,为了优化查询,综合索引最好(player_id,season_id,game_id,points)。 HOWEVER, if you are only looking for whatever "the current season" is, then have your index on (season_id, player_id, game_id, points) putting the SEASON ID in first position to prequalify the WHERE clause. 但是,如果您只是寻找“当前季节”的任何内容,那么让您的索引(season_id,player_id,game_id,points)将SEASON ID放在第一位置以预先认证WHERE子句。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM