[英]Groupby with finding highest value in subset
我的数据如下:
In [16]: game_df.head(9)
Out[16]:
team_id game_id game_date w l wins losses winning%
0 1 1 11/16/18 1 0 20 10 0.666667
1 1 3 11/18/18 0 1 20 11 0.645161
2 1 6 11/21/18 0 1 20 12 0.625000
3 2 4 11/19/18 1 0 16 14 0.533333
4 2 8 11/23/18 1 0 17 14 0.548387
5 2 9 11/24/18 0 1 17 15 0.531250
6 3 2 11/17/18 0 1 24 8 0.750000
7 3 5 11/20/18 1 0 25 8 0.757576
8 3 7 11/22/18 1 0 26 8 0.764706
我需要获取Winning%列,并从每个team_id(包括两端)的最新观察值中减去每一行的观察值,但仅使用最大值。
所以我想找回这样的东西:
In [16]: game_df.head(9)
Out[16]:
team_id game_id game_date w l wins losses winning% w%_bac
0 1 1 11/16/18 1 0 20 10 0.666667 --
1 1 3 11/18/18 0 1 20 11 0.645161 -0.10483
2 1 6 11/21/18 0 1 20 12 0.625000 -0.13257
3 2 4 11/19/18 1 0 16 14 0.533333 -0.21667
4 2 8 11/23/18 1 0 17 14 0.548387 -0.21632
5 2 9 11/24/18 0 1 17 15 0.531250 -0.23346
6 3 2 11/17/18 0 1 24 8 0.750000 0.00000
7 3 5 11/20/18 1 0 25 8 0.757576 0.00000
8 3 7 11/22/18 1 0 26 8 0.764706 0.00000
因此,在11/24/18比赛第9场中,第2队输了,获胜率从0.548387降至0.531250。 因此,与其他两支球队相比,它的排名还处于后面。在当时,这支队伍分别为0.625000(第1队)和0.764706(第3队)。 因此,后备队#2将为-0.233456。
最后,我需要计算每个team_id在那个时刻的顺序,即在11/24/18上,team_id的排名是3,1,2。
谢谢
df = df.sort_values(by='game_date') # sort by date
# add a column for each team's latest %age, fill forward NaN (but not back)
for team_id in df['team_id'].unique():
df[str(team_id) + 'win_%'] = df.loc[df.team_id == team_id, ['winning%', 'game_date']].set_index(
'game_date').reindex(df.game_date).sort_index().fillna(method='ffill').values
# fillback missing (NaN) with 0
df = df.fillna(0)
# get min difference (greatest negative) for each row
df['w%_bac'] = pd.concat([df['winning%'] - df['1win_%'], df['winning%'] - df['2win_%'], df['winning%'] -
df['3win_%']], axis=1).min(1)
# drop helper columns
df = df.drop(columns=['1win_%', '2win_%', '3win_%'])
df
team_id game_id game_date w l wins losses winning% w%_bac
0 1 1 11/16/18 1 0 20 10 0.667 0.000
6 3 2 11/17/18 0 1 24 8 0.750 0.000
1 1 3 11/18/18 0 1 20 11 0.645 -0.105
3 2 4 11/19/18 1 0 16 14 0.533 -0.217
7 3 5 11/20/18 1 0 25 8 0.758 0.000
2 1 6 11/21/18 0 1 20 12 0.625 -0.133
8 3 7 11/22/18 1 0 26 8 0.765 0.000
4 2 8 11/23/18 1 0 17 14 0.548 -0.216
5 2 9 11/24/18 0 1 17 15 0.531 -0.233
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.