I have a dataset (nba_data) which I'm having trouble transposing. What I want is to transform the following,
TEAM_ABBREVIATION GAME_DATE WinLoss HomeAway
ATL 2016-10-27 W H
ATL 2016-10-29 W A
ATL 2016-10-31 W H
ATL 2016-11-02 L H
BKN 2016-10-26 L A
BKN 2016-10-28 W H
BKN 2016-10-29 L A
BKN 2016-10-31 L H
to the following,
TEAM_ABBREVIATION GAME_DATE HomeWin HomeLoss AwayWin AwayLoss
ATL 2016-10-27 1 0 0 0
ATL 2016-10-29 1 0 1 0
ATL 2016-10-31 2 0 1 0
ATL 2016-11-02 2 1 1 0
BKN 2016-10-26 0 0 0 1
BKN 2016-10-28 1 0 0 1
BKN 2016-10-29 1 0 0 2
BKN 2016-10-31 1 1 0 2
If you could please help that would be great.
Thanks, Tom
import pandas as pd
df = pd.DataFrame({'GAME_DATE': ['2016-10-27', '2016-10-29', '2016-10-31', '2016-11-02', '2016-10-26', '2016-10-28', '2016-10-29', '2016-10-31'], 'HomeAway': ['H', 'A', 'H', 'H', 'A', 'H', 'A', 'H'], 'TEAM_ABBREVIATION': ['ATL', 'ATL', 'ATL', 'ATL', 'BKN', 'BKN', 'BKN', 'BKN'], 'WinLoss': ['W', 'W', 'W', 'L', 'L', 'W', 'L', 'L']})
result = pd.get_dummies(df['HomeAway'] + df['WinLoss']).astype('int')
result = result.groupby(df['TEAM_ABBREVIATION']).transform('cumsum')
result = result.sort_index(axis='columns', ascending=False)
result = result.rename(columns={'AL':'AwayLoss', 'AW':'AwayWin',
'HL':'HomeLoss', 'HW':'HomeWin'})
result = pd.concat([df[['TEAM_ABBREVIATION', 'GAME_DATE']], result], axis='columns')
yields
TEAM_ABBREVIATION GAME_DATE HomeWin HomeLoss AwayWin AwayLoss
0 ATL 2016-10-27 1 0 0 0
1 ATL 2016-10-29 1 0 1 0
2 ATL 2016-10-31 2 0 1 0
3 ATL 2016-11-02 2 1 1 0
4 BKN 2016-10-26 0 0 0 1
5 BKN 2016-10-28 1 0 0 1
6 BKN 2016-10-29 1 0 0 2
7 BKN 2016-10-31 1 1 0 2
The first idea is that there are 4 kinds of "events" corresponding to the 4 combinations of possible values from the WinLoss
and HomeAway
columns: (W,H)
, (W,A)
, (L,H)
and (L,A)
.
Thus it is natural to want to combine the WinLoss
and HomeAway
columns into a single column:
In [111]: df['HomeAway'] + df['WinLoss']
Out[111]:
0 HW
1 AW
2 HW
3 HL
4 AL
5 HW
6 AL
7 HL
dtype: object
and then use get_dummies
to convert this Series into a table of 1's and 0's:
In [112]: pd.get_dummies(df['HomeAway'] + df['WinLoss']).astype('int')
Out[112]:
AL AW HL HW
0 0 0 0 1
1 0 1 0 0
2 0 0 0 1
3 0 0 1 0
4 1 0 0 0
5 0 0 0 1
6 1 0 0 0
7 0 0 1 0
Now by comparison with your desired result, we can see we also want to take a cumulative sum, grouped by TEAM_ABBREVIATION
:
In [114]: result.groupby(df['TEAM_ABBREVIATION']).transform('cumsum')
Out[114]:
AL AW HL HW
0 0 0 0 1
1 0 1 0 1
2 0 1 0 2
3 0 1 1 2
4 1 0 0 0
5 1 0 0 1
6 2 0 0 1
7 2 0 1 1
The next two lines reorders and renames the columns:
result = result.sort_index(axis='columns', ascending=False)
result = result.rename(columns={'AL':'AwayLoss', 'AW':'AwayWin',
'HL':'HomeLoss', 'HW':'HomeWin'})
Finally, we can use pd.concat
to concatenate df
with result
and build the desired DataFrame:
result = pd.concat([df[['TEAM_ABBREVIATION', 'GAME_DATE']], result], axis='columns')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.