[英]MySQL very slow query
我的表格包含以下列:
gamelogs_id (auto_increment primary key)
player_id (int)
player_name (varchar)
game_id (int)
season_id (int)
points (int)
该表具有以下索引
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| player_gamelogs | 0 | PRIMARY | 1 | player_gamelogs_id | A | 371330 | NULL | NULL | | BTREE | | |
| player_gamelogs | 1 | player_name | 1 | player_name | A | 3375 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | points | 1 | points | A | 506 | NULL | NULL | YES | BTREE | ## Heading ##| |
| player_gamelogs | 1 | game_id | 1 | game_id | A | 37133 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | season | 1 | season | A | 30 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | team_abbreviation | 1 | team_abbreviation | A | 70 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | player_id | 1 | game_id | A | 41258 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | player_id | 2 | player_id | A | 371330 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | player_id | 3 | dk_points | A | 371330 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | game_player_season | 1 | game_id | A | 41258 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | game_player_season | 2 | player_id | A | 371330 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | game_player_season | 3 | season_id | A | 371330 | NULL | NULL | | BTREE | | |
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
我试图在比赛开始之前计算一个赛季和球员的积分平均值。 因此,对于本赛季的第3场比赛,avg_points将是游戏1和2的平均值。游戏数量按顺序排列,使得较早的游戏比较晚的游戏少。 我也可以选择使用日期字段,但我认为数字比较会更快?
我的查询如下:
SELECT game_id,
player_id,
player_name,
(SELECT avg(points)
FROM player_gamelogs t2
WHERE t2.game_id < t1.game_id
AND t1.player_id = t2.player_id
AND t1.season_id = t2.season_id) AS avg_points
FROM player_gamelogs t1
ORDER BY player_name, game_id;
EXPLAIN生成以下输出:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+--------------------------------------+------+---------+------+--------+-------------------------------------------------+
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 371330 | Using filesort |
| 2 | DEPENDENT SUBQUERY | t2 | ALL | game_id,player_id,game_player_season | NULL | NULL | NULL | 371330 | Range checked for each record (index map: 0xC8) |
我不确定这是因为涉及的任务的性质还是因为我的查询效率低下。 谢谢你的任何建议!
请考虑以下查询:
SELECT t1.season_id, t1.game_id, t1.player_id, t1.player_name, AVG(COALESCE(t2.points, 0)) AS average_player_points
FROM player_gamelogs t1
LEFT JOIN player_gamelogs t2 ON
t1.game_id > t2.game_id
AND t1.player_id = t2.player_id
AND t1.season_id = t2.season_id
GROUP BY
t1.season_id, t1.game_id, t1.player_id, t1.player_name
ORDER BY t1.player_name, t1.game_id;
笔记:
Group by
已分组的列进行分组。 如果可以,请避免事后订购,因为它会产生无用的开销。 正如评论中所述,这不是一种官方行为,并且假设其随时间的一致性的结果应该考虑与突然失去分类的风险。 你的查询写得很好:
SELECT game_id, player_id, player_name,
(SELECT avg(t2.points)
FROM player_gamelogs t2
WHERE t2.game_id < t1.game_id AND
t1.player_id = t2.player_id AND
t1.season_id = t2.season_id
) AS avg_points
FROM player_gamelogs t1
ORDER BY player_name, game_id;
但是,为了获得最佳性能,您需要两个复合索引: (player_id, season_id, game_id, points)
和(player_name, game_id, season_id)
。
第一个索引应该加速子查询。 第二个是外部order by
。
正如您现在的查询一样,您正在为每个玩家运行每个游戏及其下的所有游戏...例如,如果您每人有10个游戏,则每个季节/人获得以下结果
Game 10, Game 10 points, avg of games 1-9
Game 9, Game 9 points, avg of games 1-8...
...
...
Game 2, Game 2 points, avg of thus final game 1 only.
你声明你想要最新的游戏,其中包含一切的平均值。 也就是说,我假设你并不关心每个人的每个较低的游戏关卡。
您还在进行涵盖所有季节的查询。 如果一个季节结束,你关心旧季节吗? 或者只是当前的季节。 否则你将经历所有赛季,所有球员......
总而言之,我提供以下内容。 首先,使用WHERE子句将查询限制为最新季节,但我特意将季节留在查询/组中,以防您想要其他季节。 然后,我将给定人/季的MAXIMUM游戏作为最后1行(每人季节)的基线,然后得到其下的所有内容的平均值。 因此,在10场比赛的场景样本中,我将不会抓住9-2的基础行,只是按照我的场景返回#10游戏。
select
pgMax.Player_ID,
pgMax.Season_ID,
pgMax.mostRecentGameID,
pgl3.points as mostRecentGamePoints,
pgl3.player_name,
coalesce( avg( pgl2.points ), 0 ) as AvgPointsPriorToCurrentGame
from
( select pgl1.player_id,
pgl1.season_id,
max( pgl1.game_id ) as mostRecentGameID
from
player_gameLogs pgl1
where
pgl1.season_id = JustOneSeason
group by
pgl1.player_id,
pgl1.season_id ) pgMax
JOIN player_gamelogs pgl pgl2
on pgMax.player_id = pgl2.player_id
AND pgMax.season_id = pgl2.season_id
AND pgMax.mostRecentGameID > pgl2.game_id
JOIN player_gamelogs pgl pgl3
on pgMax.player_id = pgl3.player_id
AND pgMax.season_id = pgl3.season_id
AND pgMax.mostRecentGameID = pgl3.game_id
group by
pgMax.Player_ID,
pgMax.Season_ID
order by
pgMax.Player_ID
现在,为了优化查询,综合索引最好(player_id,season_id,game_id,points)。 但是,如果您只是寻找“当前季节”的任何内容,那么让您的索引(season_id,player_id,game_id,points)将SEASON ID放在第一位置以预先认证WHERE子句。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.