繁体   English   中英

Python-熊猫转置Gamelog数据

[英]Python - Pandas transpose gamelog data

我有一个难以转换的数据集(nba_data)。 我想要的是改变以下内容,

TEAM_ABBREVIATION   GAME_DATE   WinLoss   HomeAway
ATL                 2016-10-27  W             H
ATL                 2016-10-29  W             A
ATL                 2016-10-31  W             H
ATL                 2016-11-02  L             H
BKN                 2016-10-26  L             A
BKN                 2016-10-28  W             H
BKN                 2016-10-29  L             A
BKN                 2016-10-31  L             H

到以下

TEAM_ABBREVIATION   GAME_DATE   HomeWin HomeLoss AwayWin AwayLoss
ATL                2016-10-27     1        0         0      0
ATL                2016-10-29     1        0         1      0
ATL                2016-10-31     2        0         1      0
ATL                2016-11-02     2        1         1      0
BKN                2016-10-26     0        0         0      1
BKN                2016-10-28     1        0         0      1
BKN                2016-10-29     1        0         0      2
BKN                2016-10-31     1        1         0      2

如果可以的话,请帮忙。

谢谢汤姆

import pandas as pd

df = pd.DataFrame({'GAME_DATE': ['2016-10-27', '2016-10-29', '2016-10-31', '2016-11-02', '2016-10-26', '2016-10-28', '2016-10-29', '2016-10-31'], 'HomeAway': ['H', 'A', 'H', 'H', 'A', 'H', 'A', 'H'], 'TEAM_ABBREVIATION': ['ATL', 'ATL', 'ATL', 'ATL', 'BKN', 'BKN', 'BKN', 'BKN'], 'WinLoss': ['W', 'W', 'W', 'L', 'L', 'W', 'L', 'L']})

result = pd.get_dummies(df['HomeAway'] + df['WinLoss']).astype('int')
result = result.groupby(df['TEAM_ABBREVIATION']).transform('cumsum')
result = result.sort_index(axis='columns', ascending=False)
result = result.rename(columns={'AL':'AwayLoss', 'AW':'AwayWin', 
                                'HL':'HomeLoss', 'HW':'HomeWin'})
result = pd.concat([df[['TEAM_ABBREVIATION', 'GAME_DATE']], result], axis='columns')

产量

  TEAM_ABBREVIATION   GAME_DATE  HomeWin  HomeLoss  AwayWin  AwayLoss
0               ATL  2016-10-27        1         0        0         0
1               ATL  2016-10-29        1         0        1         0
2               ATL  2016-10-31        2         0        1         0
3               ATL  2016-11-02        2         1        1         0
4               BKN  2016-10-26        0         0        0         1
5               BKN  2016-10-28        1         0        0         1
6               BKN  2016-10-29        1         0        0         2
7               BKN  2016-10-31        1         1        0         2

第一个想法是,与WinLossHomeAway列中的可能值的4种组合相对应,有4种“事件”: (W,H)(W,A)(L,H)(L,A)

因此,很自然地希望将WinLossHomeAway列合并为一个列:

In [111]: df['HomeAway'] + df['WinLoss']
Out[111]: 
0    HW
1    AW
2    HW
3    HL
4    AL
5    HW
6    AL
7    HL
dtype: object

然后使用get_dummies将此系列转换为1和0的表:

In [112]: pd.get_dummies(df['HomeAway'] + df['WinLoss']).astype('int')
Out[112]: 
   AL  AW  HL  HW
0   0   0   0   1
1   0   1   0   0
2   0   0   0   1
3   0   0   1   0
4   1   0   0   0
5   0   0   0   1
6   1   0   0   0
7   0   0   1   0

现在,通过与您期望的结果进行比较,我们可以看到我们也希望取一个累加的总和,按TEAM_ABBREVIATION分组:

In [114]: result.groupby(df['TEAM_ABBREVIATION']).transform('cumsum')
Out[114]: 
   AL  AW  HL  HW
0   0   0   0   1
1   0   1   0   1
2   0   1   0   2
3   0   1   1   2
4   1   0   0   0
5   1   0   0   1
6   2   0   0   1
7   2   0   1   1

接下来的两行重新排序并重命名各列:

result = result.sort_index(axis='columns', ascending=False)
result = result.rename(columns={'AL':'AwayLoss', 'AW':'AwayWin', 
                                'HL':'HomeLoss', 'HW':'HomeWin'})

最后,我们可以使用pd.concatdfresult连接起来并构建所需的DataFrame:

result = pd.concat([df[['TEAM_ABBREVIATION', 'GAME_DATE']], result], axis='columns')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM