I have data as follows:
In [16]: game_df.head(9)
Out[16]:
team_id game_id game_date w l wins losses winning%
0 1 1 11/16/18 1 0 20 10 0.666667
1 1 3 11/18/18 0 1 20 11 0.645161
2 1 6 11/21/18 0 1 20 12 0.625000
3 2 4 11/19/18 1 0 16 14 0.533333
4 2 8 11/23/18 1 0 17 14 0.548387
5 2 9 11/24/18 0 1 17 15 0.531250
6 3 2 11/17/18 0 1 24 8 0.750000
7 3 5 11/20/18 1 0 25 8 0.757576
8 3 7 11/22/18 1 0 26 8 0.764706
What I need is to take the Winning% column and subtract each row's observation from the latest observation for each team_id (inclusive) but only use the largest value.
So I would want to get something like this back:
In [16]: game_df.head(9)
Out[16]:
team_id game_id game_date w l wins losses winning% w%_bac
0 1 1 11/16/18 1 0 20 10 0.666667 --
1 1 3 11/18/18 0 1 20 11 0.645161 -0.10483
2 1 6 11/21/18 0 1 20 12 0.625000 -0.13257
3 2 4 11/19/18 1 0 16 14 0.533333 -0.21667
4 2 8 11/23/18 1 0 17 14 0.548387 -0.21632
5 2 9 11/24/18 0 1 17 15 0.531250 -0.23346
6 3 2 11/17/18 0 1 24 8 0.750000 0.00000
7 3 5 11/20/18 1 0 25 8 0.757576 0.00000
8 3 7 11/22/18 1 0 26 8 0.764706 0.00000
So in game 9 on 11/24/18 team 2 lost and its winning% fell from 0.548387 to 0.531250. It therefore fell behind further relative to the other 2 teams - who, at that point stood at 0.625000 (team #1) & 0.764706 (team #3). So the %back team #2 would be is -0.233456.
Finally, I need to calculate where in order each team_id would be at that moment, ie, on 11/24/18 the team_id ranking would be 3,1,2.
thanks
df = df.sort_values(by='game_date') # sort by date
# add a column for each team's latest %age, fill forward NaN (but not back)
for team_id in df['team_id'].unique():
df[str(team_id) + 'win_%'] = df.loc[df.team_id == team_id, ['winning%', 'game_date']].set_index(
'game_date').reindex(df.game_date).sort_index().fillna(method='ffill').values
# fillback missing (NaN) with 0
df = df.fillna(0)
# get min difference (greatest negative) for each row
df['w%_bac'] = pd.concat([df['winning%'] - df['1win_%'], df['winning%'] - df['2win_%'], df['winning%'] -
df['3win_%']], axis=1).min(1)
# drop helper columns
df = df.drop(columns=['1win_%', '2win_%', '3win_%'])
df
team_id game_id game_date w l wins losses winning% w%_bac
0 1 1 11/16/18 1 0 20 10 0.667 0.000
6 3 2 11/17/18 0 1 24 8 0.750 0.000
1 1 3 11/18/18 0 1 20 11 0.645 -0.105
3 2 4 11/19/18 1 0 16 14 0.533 -0.217
7 3 5 11/20/18 1 0 25 8 0.758 0.000
2 1 6 11/21/18 0 1 20 12 0.625 -0.133
8 3 7 11/22/18 1 0 26 8 0.765 0.000
4 2 8 11/23/18 1 0 17 14 0.548 -0.216
5 2 9 11/24/18 0 1 17 15 0.531 -0.233
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.