My table has the following columns:
gamelogs_id (auto_increment primary key)
player_id (int)
player_name (varchar)
game_id (int)
season_id (int)
points (int)
The table has the following indexes
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| player_gamelogs | 0 | PRIMARY | 1 | player_gamelogs_id | A | 371330 | NULL | NULL | | BTREE | | |
| player_gamelogs | 1 | player_name | 1 | player_name | A | 3375 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | points | 1 | points | A | 506 | NULL | NULL | YES | BTREE | ## Heading ##| |
| player_gamelogs | 1 | game_id | 1 | game_id | A | 37133 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | season | 1 | season | A | 30 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | team_abbreviation | 1 | team_abbreviation | A | 70 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | player_id | 1 | game_id | A | 41258 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | player_id | 2 | player_id | A | 371330 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | player_id | 3 | dk_points | A | 371330 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | game_player_season | 1 | game_id | A | 41258 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | game_player_season | 2 | player_id | A | 371330 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | game_player_season | 3 | season_id | A | 371330 | NULL | NULL | | BTREE | | |
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
I am trying to calculate the mean of points for a season and player prior to the game being played. So for the 3rd game of the season, avg_points would be the mean of games 1 and 2. The game numbers are in sequential order such that an earlier game is less than a later game. I also have the option to use a date field but I figured that numeric comparison would be faster?
My query is as follows:
SELECT game_id,
player_id,
player_name,
(SELECT avg(points)
FROM player_gamelogs t2
WHERE t2.game_id < t1.game_id
AND t1.player_id = t2.player_id
AND t1.season_id = t2.season_id) AS avg_points
FROM player_gamelogs t1
ORDER BY player_name, game_id;
EXPLAIN produces the following output:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+--------------------------------------+------+---------+------+--------+-------------------------------------------------+
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 371330 | Using filesort |
| 2 | DEPENDENT SUBQUERY | t2 | ALL | game_id,player_id,game_player_season | NULL | NULL | NULL | 371330 | Range checked for each record (index map: 0xC8) |
I am not sure if it is because of the nature of the task involved or because of an inefficiency in my query. Thanks for any suggestions!
Please consider this query:
SELECT t1.season_id, t1.game_id, t1.player_id, t1.player_name, AVG(COALESCE(t2.points, 0)) AS average_player_points
FROM player_gamelogs t1
LEFT JOIN player_gamelogs t2 ON
t1.game_id > t2.game_id
AND t1.player_id = t2.player_id
AND t1.season_id = t2.season_id
GROUP BY
t1.season_id, t1.game_id, t1.player_id, t1.player_name
ORDER BY t1.player_name, t1.game_id;
Notes:
Group by
already sorts by grouped columns. If you can, avoid ordering afterwards as it generates useless overhead. As outlined in the comments, this is not an official behavior and the outcome of assuming its consistency over time should be pondered vs the risk of suddenly losing sorting. Your query is fine as written:
SELECT game_id, player_id, player_name,
(SELECT avg(t2.points)
FROM player_gamelogs t2
WHERE t2.game_id < t1.game_id AND
t1.player_id = t2.player_id AND
t1.season_id = t2.season_id
) AS avg_points
FROM player_gamelogs t1
ORDER BY player_name, game_id;
But, for optimal performance you want two composite indexes on it: (player_id, season_id, game_id, points)
and (player_name, game_id, season_id)
.
The first index should speed the subquery. The second is for the outer order by
.
As you have your query now, you are running for EACH game and all the games under it for every player... So, for example, if you had 10 games per person, you are getting the following results per season/person
Game 10, Game 10 points, avg of games 1-9
Game 9, Game 9 points, avg of games 1-8...
...
...
Game 2, Game 2 points, avg of thus final game 1 only.
You stated you wanted the most recent game with the average of everything under it. That said, I am assuming you do NOT care about each of the lower game levels per person.
You are also doing the query covering ALL seasons. If a season is finished, do you care about old seasons? or just the current season. Otherwise you are going through all seasons, all players...
All that said, I offer the following. First, limit the query to whatever the latest season is by using the WHERE clause, but I am INTENTIONALLY leaving the season in the query / group by in case you DO want other seasons. Then, I am getting the MAXIMUM game for a given person / season as the baseline for the final 1 row (per person season), then getting the average of everything under that. So, in the scenario sample of 10 games down to 2, I won't be grabbing the underlying rows 9-2, just returning the #10 game per my scenario.
select
pgMax.Player_ID,
pgMax.Season_ID,
pgMax.mostRecentGameID,
pgl3.points as mostRecentGamePoints,
pgl3.player_name,
coalesce( avg( pgl2.points ), 0 ) as AvgPointsPriorToCurrentGame
from
( select pgl1.player_id,
pgl1.season_id,
max( pgl1.game_id ) as mostRecentGameID
from
player_gameLogs pgl1
where
pgl1.season_id = JustOneSeason
group by
pgl1.player_id,
pgl1.season_id ) pgMax
JOIN player_gamelogs pgl pgl2
on pgMax.player_id = pgl2.player_id
AND pgMax.season_id = pgl2.season_id
AND pgMax.mostRecentGameID > pgl2.game_id
JOIN player_gamelogs pgl pgl3
on pgMax.player_id = pgl3.player_id
AND pgMax.season_id = pgl3.season_id
AND pgMax.mostRecentGameID = pgl3.game_id
group by
pgMax.Player_ID,
pgMax.Season_ID
order by
pgMax.Player_ID
Now, for optimizing the query, a composite index would be best on (player_id, season_id, game_id, points). HOWEVER, if you are only looking for whatever "the current season" is, then have your index on (season_id, player_id, game_id, points) putting the SEASON ID in first position to prequalify the WHERE clause.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.