[英]Python - Pandas transpose gamelog data
我有一个难以转换的数据集(nba_data)。 我想要的是改变以下内容,
TEAM_ABBREVIATION GAME_DATE WinLoss HomeAway
ATL 2016-10-27 W H
ATL 2016-10-29 W A
ATL 2016-10-31 W H
ATL 2016-11-02 L H
BKN 2016-10-26 L A
BKN 2016-10-28 W H
BKN 2016-10-29 L A
BKN 2016-10-31 L H
到以下
TEAM_ABBREVIATION GAME_DATE HomeWin HomeLoss AwayWin AwayLoss
ATL 2016-10-27 1 0 0 0
ATL 2016-10-29 1 0 1 0
ATL 2016-10-31 2 0 1 0
ATL 2016-11-02 2 1 1 0
BKN 2016-10-26 0 0 0 1
BKN 2016-10-28 1 0 0 1
BKN 2016-10-29 1 0 0 2
BKN 2016-10-31 1 1 0 2
如果可以的话,请帮忙。
谢谢汤姆
import pandas as pd
df = pd.DataFrame({'GAME_DATE': ['2016-10-27', '2016-10-29', '2016-10-31', '2016-11-02', '2016-10-26', '2016-10-28', '2016-10-29', '2016-10-31'], 'HomeAway': ['H', 'A', 'H', 'H', 'A', 'H', 'A', 'H'], 'TEAM_ABBREVIATION': ['ATL', 'ATL', 'ATL', 'ATL', 'BKN', 'BKN', 'BKN', 'BKN'], 'WinLoss': ['W', 'W', 'W', 'L', 'L', 'W', 'L', 'L']})
result = pd.get_dummies(df['HomeAway'] + df['WinLoss']).astype('int')
result = result.groupby(df['TEAM_ABBREVIATION']).transform('cumsum')
result = result.sort_index(axis='columns', ascending=False)
result = result.rename(columns={'AL':'AwayLoss', 'AW':'AwayWin',
'HL':'HomeLoss', 'HW':'HomeWin'})
result = pd.concat([df[['TEAM_ABBREVIATION', 'GAME_DATE']], result], axis='columns')
产量
TEAM_ABBREVIATION GAME_DATE HomeWin HomeLoss AwayWin AwayLoss
0 ATL 2016-10-27 1 0 0 0
1 ATL 2016-10-29 1 0 1 0
2 ATL 2016-10-31 2 0 1 0
3 ATL 2016-11-02 2 1 1 0
4 BKN 2016-10-26 0 0 0 1
5 BKN 2016-10-28 1 0 0 1
6 BKN 2016-10-29 1 0 0 2
7 BKN 2016-10-31 1 1 0 2
第一个想法是,与WinLoss
和HomeAway
列中的可能值的4种组合相对应,有4种“事件”: (W,H)
, (W,A)
, (L,H)
和(L,A)
。
因此,很自然地希望将WinLoss
和HomeAway
列合并为一个列:
In [111]: df['HomeAway'] + df['WinLoss']
Out[111]:
0 HW
1 AW
2 HW
3 HL
4 AL
5 HW
6 AL
7 HL
dtype: object
然后使用get_dummies
将此系列转换为1和0的表:
In [112]: pd.get_dummies(df['HomeAway'] + df['WinLoss']).astype('int')
Out[112]:
AL AW HL HW
0 0 0 0 1
1 0 1 0 0
2 0 0 0 1
3 0 0 1 0
4 1 0 0 0
5 0 0 0 1
6 1 0 0 0
7 0 0 1 0
现在,通过与您期望的结果进行比较,我们可以看到我们也希望取一个累加的总和,按TEAM_ABBREVIATION
分组:
In [114]: result.groupby(df['TEAM_ABBREVIATION']).transform('cumsum')
Out[114]:
AL AW HL HW
0 0 0 0 1
1 0 1 0 1
2 0 1 0 2
3 0 1 1 2
4 1 0 0 0
5 1 0 0 1
6 2 0 0 1
7 2 0 1 1
接下来的两行重新排序并重命名各列:
result = result.sort_index(axis='columns', ascending=False)
result = result.rename(columns={'AL':'AwayLoss', 'AW':'AwayWin',
'HL':'HomeLoss', 'HW':'HomeWin'})
最后,我们可以使用pd.concat
将df
与result
连接起来并构建所需的DataFrame:
result = pd.concat([df[['TEAM_ABBREVIATION', 'GAME_DATE']], result], axis='columns')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.