简体   繁体   中英

Python - Pandas transpose gamelog data

I have a dataset (nba_data) which I'm having trouble transposing. What I want is to transform the following,

TEAM_ABBREVIATION   GAME_DATE   WinLoss   HomeAway
ATL                 2016-10-27  W             H
ATL                 2016-10-29  W             A
ATL                 2016-10-31  W             H
ATL                 2016-11-02  L             H
BKN                 2016-10-26  L             A
BKN                 2016-10-28  W             H
BKN                 2016-10-29  L             A
BKN                 2016-10-31  L             H

to the following,

TEAM_ABBREVIATION   GAME_DATE   HomeWin HomeLoss AwayWin AwayLoss
ATL                2016-10-27     1        0         0      0
ATL                2016-10-29     1        0         1      0
ATL                2016-10-31     2        0         1      0
ATL                2016-11-02     2        1         1      0
BKN                2016-10-26     0        0         0      1
BKN                2016-10-28     1        0         0      1
BKN                2016-10-29     1        0         0      2
BKN                2016-10-31     1        1         0      2

If you could please help that would be great.

Thanks, Tom

import pandas as pd

df = pd.DataFrame({'GAME_DATE': ['2016-10-27', '2016-10-29', '2016-10-31', '2016-11-02', '2016-10-26', '2016-10-28', '2016-10-29', '2016-10-31'], 'HomeAway': ['H', 'A', 'H', 'H', 'A', 'H', 'A', 'H'], 'TEAM_ABBREVIATION': ['ATL', 'ATL', 'ATL', 'ATL', 'BKN', 'BKN', 'BKN', 'BKN'], 'WinLoss': ['W', 'W', 'W', 'L', 'L', 'W', 'L', 'L']})

result = pd.get_dummies(df['HomeAway'] + df['WinLoss']).astype('int')
result = result.groupby(df['TEAM_ABBREVIATION']).transform('cumsum')
result = result.sort_index(axis='columns', ascending=False)
result = result.rename(columns={'AL':'AwayLoss', 'AW':'AwayWin', 
                                'HL':'HomeLoss', 'HW':'HomeWin'})
result = pd.concat([df[['TEAM_ABBREVIATION', 'GAME_DATE']], result], axis='columns')

yields

  TEAM_ABBREVIATION   GAME_DATE  HomeWin  HomeLoss  AwayWin  AwayLoss
0               ATL  2016-10-27        1         0        0         0
1               ATL  2016-10-29        1         0        1         0
2               ATL  2016-10-31        2         0        1         0
3               ATL  2016-11-02        2         1        1         0
4               BKN  2016-10-26        0         0        0         1
5               BKN  2016-10-28        1         0        0         1
6               BKN  2016-10-29        1         0        0         2
7               BKN  2016-10-31        1         1        0         2

The first idea is that there are 4 kinds of "events" corresponding to the 4 combinations of possible values from the WinLoss and HomeAway columns: (W,H) , (W,A) , (L,H) and (L,A) .

Thus it is natural to want to combine the WinLoss and HomeAway columns into a single column:

In [111]: df['HomeAway'] + df['WinLoss']
Out[111]: 
0    HW
1    AW
2    HW
3    HL
4    AL
5    HW
6    AL
7    HL
dtype: object

and then use get_dummies to convert this Series into a table of 1's and 0's:

In [112]: pd.get_dummies(df['HomeAway'] + df['WinLoss']).astype('int')
Out[112]: 
   AL  AW  HL  HW
0   0   0   0   1
1   0   1   0   0
2   0   0   0   1
3   0   0   1   0
4   1   0   0   0
5   0   0   0   1
6   1   0   0   0
7   0   0   1   0

Now by comparison with your desired result, we can see we also want to take a cumulative sum, grouped by TEAM_ABBREVIATION :

In [114]: result.groupby(df['TEAM_ABBREVIATION']).transform('cumsum')
Out[114]: 
   AL  AW  HL  HW
0   0   0   0   1
1   0   1   0   1
2   0   1   0   2
3   0   1   1   2
4   1   0   0   0
5   1   0   0   1
6   2   0   0   1
7   2   0   1   1

The next two lines reorders and renames the columns:

result = result.sort_index(axis='columns', ascending=False)
result = result.rename(columns={'AL':'AwayLoss', 'AW':'AwayWin', 
                                'HL':'HomeLoss', 'HW':'HomeWin'})

Finally, we can use pd.concat to concatenate df with result and build the desired DataFrame:

result = pd.concat([df[['TEAM_ABBREVIATION', 'GAME_DATE']], result], axis='columns')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM