Given the following DataFrame:
+----+--------+------------+------+---------------------+
| id | player | match_date | stat | days_since_max_stat |
+----+--------+------------+------+---------------------+
| 1 | 1 | 2022-01-01 | 1500 | NaN |
| 2 | 1 | 2022-01-03 | 1600 | 2 |
| 3 | 1 | 2022-01-10 | 2100 | 7 |
| 4 | 1 | 2022-01-11 | 1800 | 1 |
| 5 | 1 | 2022-01-18 | 1700 | 8 |
| 6 | 2 | 2022-01-01 | 1600 | NaN |
| 7 | 2 | 2022-01-03 | 1800 | 2 |
| 8 | 2 | 2022-01-10 | 1600 | 7 |
| 9 | 2 | 2022-01-11 | 1900 | 8 |
| 10 | 2 | 2022-01-18 | 1500 | 7 |
+----+--------+------------+------+---------------------+
How would I calculate the days_since_max_stat
column? The calculation of this column is exclusive of the stat
in that row and per player
.
For example the value for the row where id
= 5 is 8 because the max stat
was in the row where id
= 3. The days_since_max_stat
= 2022-01-18 - 2022-01-10 = 8.
Here's the base DataFrame:
import datetime as dt
import pandas as pd
dates = [
dt.datetime(2022, 1, 1),
dt.datetime(2022, 1, 3),
dt.datetime(2022, 1, 10),
dt.datetime(2022, 1, 11),
dt.datetime(2022, 1, 18),
]
df = pd.DataFrame(
{
"id": range(1, 11),
"player": [1 for i in range(5)] + [2 for i in range(5)],
"match_date": dates + dates,
"stat": (1500, 1600, 2100, 1800, 1700, 1600, 1800, 1600, 1900, 1500)
}
)
You can use a double groupby
. The important part is to compute a new group to put together the rows that are lower than the last max. Once you have done that this is a simple cumsum
per group:
g = df.groupby(df['player'])
# date diff per group (days)
diff = g['match_date'].diff().dt.days
# group per lower than last max
g2 = df['stat'].ge(g['stat'].cummax()).shift().cumsum()
# days since last max
df['dsms'] = diff.groupby([df['player'], g2]).cumsum()
Output:
id player match_date stat dsms
0 1 1 2022-01-01 1500 NaN
1 2 1 2022-01-03 1600 2.0
2 3 1 2022-01-10 2100 7.0
3 4 1 2022-01-11 1800 1.0
4 5 1 2022-01-18 1700 8.0
5 6 2 2022-01-01 1600 NaN
6 7 2 2022-01-03 1800 2.0
7 8 2 2022-01-10 1600 7.0
8 9 2 2022-01-11 1900 8.0
First imagine you have only one id
, then you can use expanding
to find the cummulative max/idxmax. then you can subtract:
def day_since_max(data):
maxIdx = data['stat'].expanding().apply(pd.Series.idxmax)
date_at_max = data.loc[maxIdx, 'match_date'].shift()
return data['match_date'] - date_at_max.values
Now, we can use groupby().apply
to apply that function for each id
:
df['days_since_max'] = df.groupby('player').apply(day_since_max).reset_index(level=0, drop=True)
Output:
id player match_date stat days_since_max
0 1 1 2022-01-01 1500 NaT
1 2 1 2022-01-03 1600 2 days
2 3 1 2022-01-10 2100 7 days
3 4 1 2022-01-11 1800 1 days
4 5 1 2022-01-18 1700 8 days
5 6 2 2022-01-01 1600 NaT
6 7 2 2022-01-03 1800 2 days
7 8 2 2022-01-10 1600 7 days
8 9 2 2022-01-11 1900 8 days
9 10 2 2022-01-18 1500 7 days
Here's another solution that is very similar to @QuangHoang's solution, only instead of applying a function to get the values, it uses transform
.
(i) groupby
"player" and in an expanding window, find the max "stat" for each player
(ii) use groupby
on (i) by "player" and "stat" and find the index of the max stat of each group and transform it for the Series
(iii) filter df
with (ii), then groupby
"player" again and shift
(so that we can find difference in days)
(iv) find difference in days
exp_max = df.groupby('player')['stat'].expanding().max().reset_index(level=0)
idxmax = exp_max.groupby(['player','stat'])['stat'].transform('idxmax')
previous_max = df.loc[idxmax, 'match_date'].groupby(df['player']).shift().reset_index(drop=True)
df['days_since_max'] = df['match_date'] - previous_max
Output:
id player match_date stat days_since_max
0 1 1 2022-01-01 1500 NaT
1 2 1 2022-01-03 1600 2 days
2 3 1 2022-01-10 2100 7 days
3 4 1 2022-01-11 1800 1 days
4 5 1 2022-01-18 1700 8 days
5 6 2 2022-01-01 1600 NaT
6 7 2 2022-01-03 1800 2 days
7 8 2 2022-01-10 1600 7 days
8 9 2 2022-01-11 1900 8 days
9 10 2 2022-01-18 1500 7 days
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.