簡體   English   中英

Python-熊貓轉置Gamelog數據

[英]Python - Pandas transpose gamelog data

我有一個難以轉換的數據集(nba_data)。 我想要的是改變以下內容,

TEAM_ABBREVIATION   GAME_DATE   WinLoss   HomeAway
ATL                 2016-10-27  W             H
ATL                 2016-10-29  W             A
ATL                 2016-10-31  W             H
ATL                 2016-11-02  L             H
BKN                 2016-10-26  L             A
BKN                 2016-10-28  W             H
BKN                 2016-10-29  L             A
BKN                 2016-10-31  L             H

到以下

TEAM_ABBREVIATION   GAME_DATE   HomeWin HomeLoss AwayWin AwayLoss
ATL                2016-10-27     1        0         0      0
ATL                2016-10-29     1        0         1      0
ATL                2016-10-31     2        0         1      0
ATL                2016-11-02     2        1         1      0
BKN                2016-10-26     0        0         0      1
BKN                2016-10-28     1        0         0      1
BKN                2016-10-29     1        0         0      2
BKN                2016-10-31     1        1         0      2

如果可以的話,請幫忙。

謝謝湯姆

import pandas as pd

df = pd.DataFrame({'GAME_DATE': ['2016-10-27', '2016-10-29', '2016-10-31', '2016-11-02', '2016-10-26', '2016-10-28', '2016-10-29', '2016-10-31'], 'HomeAway': ['H', 'A', 'H', 'H', 'A', 'H', 'A', 'H'], 'TEAM_ABBREVIATION': ['ATL', 'ATL', 'ATL', 'ATL', 'BKN', 'BKN', 'BKN', 'BKN'], 'WinLoss': ['W', 'W', 'W', 'L', 'L', 'W', 'L', 'L']})

result = pd.get_dummies(df['HomeAway'] + df['WinLoss']).astype('int')
result = result.groupby(df['TEAM_ABBREVIATION']).transform('cumsum')
result = result.sort_index(axis='columns', ascending=False)
result = result.rename(columns={'AL':'AwayLoss', 'AW':'AwayWin', 
                                'HL':'HomeLoss', 'HW':'HomeWin'})
result = pd.concat([df[['TEAM_ABBREVIATION', 'GAME_DATE']], result], axis='columns')

產量

  TEAM_ABBREVIATION   GAME_DATE  HomeWin  HomeLoss  AwayWin  AwayLoss
0               ATL  2016-10-27        1         0        0         0
1               ATL  2016-10-29        1         0        1         0
2               ATL  2016-10-31        2         0        1         0
3               ATL  2016-11-02        2         1        1         0
4               BKN  2016-10-26        0         0        0         1
5               BKN  2016-10-28        1         0        0         1
6               BKN  2016-10-29        1         0        0         2
7               BKN  2016-10-31        1         1        0         2

第一個想法是,與WinLossHomeAway列中的可能值的4種組合相對應,有4種“事件”: (W,H)(W,A)(L,H)(L,A)

因此,很自然地希望將WinLossHomeAway列合並為一個列:

In [111]: df['HomeAway'] + df['WinLoss']
Out[111]: 
0    HW
1    AW
2    HW
3    HL
4    AL
5    HW
6    AL
7    HL
dtype: object

然后使用get_dummies將此系列轉換為1和0的表:

In [112]: pd.get_dummies(df['HomeAway'] + df['WinLoss']).astype('int')
Out[112]: 
   AL  AW  HL  HW
0   0   0   0   1
1   0   1   0   0
2   0   0   0   1
3   0   0   1   0
4   1   0   0   0
5   0   0   0   1
6   1   0   0   0
7   0   0   1   0

現在,通過與您期望的結果進行比較,我們可以看到我們也希望取一個累加的總和,按TEAM_ABBREVIATION分組:

In [114]: result.groupby(df['TEAM_ABBREVIATION']).transform('cumsum')
Out[114]: 
   AL  AW  HL  HW
0   0   0   0   1
1   0   1   0   1
2   0   1   0   2
3   0   1   1   2
4   1   0   0   0
5   1   0   0   1
6   2   0   0   1
7   2   0   1   1

接下來的兩行重新排序並重命名各列:

result = result.sort_index(axis='columns', ascending=False)
result = result.rename(columns={'AL':'AwayLoss', 'AW':'AwayWin', 
                                'HL':'HomeLoss', 'HW':'HomeWin'})

最后,我們可以使用pd.concatdfresult連接起來並構建所需的DataFrame:

result = pd.concat([df[['TEAM_ABBREVIATION', 'GAME_DATE']], result], axis='columns')

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM