重塑 pandas dataframe，計算鏈接到 2 列

Question

從這個 dataframe，我想計算團隊級別的不同統計數據

data = [['20-10-2020', 'PSG', 'Man U', 1, 2], ['20-10-2020', 'Leipzig','Istanbul',2,0], ['27-10-2020', 'Istanbul','PSG',0,2], ['27-10-2020', 'Man U','Leipzig',5,0]] 
df = pd.DataFrame(data, columns = ['Date', 'Home', 'Away', 'HG', 'AG']) 
print(df)

         Date      Home      Away  HG  AG
0  20-10-2020       PSG     Man U   1   2
1  20-10-2020   Leipzig  Istanbul   2   0
2  27-10-2020  Istanbul       PSG   0   2
3  27-10-2020     Man U   Leipzig   5   0

例如，對於每支球隊，我計算上一場比賽的得分和進球數。 簡單的實現創建了兩個數據幀，一個用於主隊，一個用於客隊並將它們連接起來。 我嘗試使用melt但我沒有找到實現我想要的 dataframe 的語法。

df_home = df.reset_index(level=0)
columns = {
    "Date": 'date',
    "Home": "team",
    "Away": "opponent",
    'HG': 'team_goals',
    'AG': 'opponent_goals',
}
df_home = df_home.rename(columns=columns)
df_home['site'] = 'H'

df_away = df.reset_index(level=0)
columns = {
    "Date": 'date',
    "Home": "opponent",
    "Away": "team",
    'HG': 'opponent_goals',
    'AG': 'team_goals',
}
df_away = df_away.rename(columns=columns)
df_away['site'] = 'A'

df_team = pd.concat([df_home, df_away], ignore_index=True).sort_values(['date'])
df_team['team'] = df_team['team'].astype('category')
df_team['opponent'] = df_team['opponent'].astype('category')
print(df_team)

   index        date      team  opponent  team_goals  opponent_goals site
0      0  20-10-2020       PSG     Man U           1               2    H
1      1  20-10-2020   Leipzig  Istanbul           2               0    H
4      0  20-10-2020     Man U       PSG           2               1    A
5      1  20-10-2020  Istanbul   Leipzig           0               2    A
2      2  27-10-2020  Istanbul       PSG           0               2    H
3      3  27-10-2020     Man U   Leipzig           5               0    H
6      2  27-10-2020       PSG  Istanbul           2               0    A
7      3  27-10-2020   Leipzig     Man U           0               5    A

有了這個 dataframe 我可以根據team列計算統計數據

conditions = [df_team['team_goals'] > df_team['opponent_goals'], df_team['team_goals'] == df_team['opponent_goals']]
choices = [3, 1]
df_team['pts'] = np.select(conditions, choices, default=0)
f = lambda x: x.shift(1).rolling(1).sum()
df_team['form_l1_before'] = df_team.groupby(['team'])['pts'].apply(f)
df_team['goal_l1_before'] = df_team.groupby(['team'])['team_goals'].apply(f)
print(df_team)

   index        date      team  opponent  team_goals  opponent_goals site  \
0      0  20-10-2020       PSG     Man U           1               2    H   
1      1  20-10-2020   Leipzig  Istanbul           2               0    H   
4      0  20-10-2020     Man U       PSG           2               1    A   
5      1  20-10-2020  Istanbul   Leipzig           0               2    A   
2      2  27-10-2020  Istanbul       PSG           0               2    H   
3      3  27-10-2020     Man U   Leipzig           5               0    H   
6      2  27-10-2020       PSG  Istanbul           2               0    A   
7      3  27-10-2020   Leipzig     Man U           0               5    A   

   pts  form_l1_before  goal_l1_before  
0    0             NaN             NaN  
1    3             NaN             NaN  
4    3             NaN             NaN  
5    0             NaN             NaN  
2    0             0.0             0.0  
3    3             3.0             2.0  
6    3             0.0             1.0  
7    0             3.0             2.0

問題是我想將 dataframe 轉換回每場比賽一行（由index列標識），並且每個統計數據都有自己的列

# Ex second game for Istanbul and PSG with stats from the previous game
expected_data = [['27-10-2020', 'Istanbul','PSG',0,2,0,0,0,1]]
df_target = pd.DataFrame(expected_data, columns = ['date', 'Home', 'Away', 'HG', 'AG', 'Home_form_l1_before', 'Home_goal_l1_before', 'Away_form_l1_before', 'Away_goal_l1_before'])
print(df_target)
         date      Home Away  HG  AG  Home_form_l1_before  \
0  27-10-2020  Istanbul  PSG   0   2                    0   

   Home_goal_l1_before  Away_form_l1_before  Away_goal_l1_before  
0                    0                    0                    1

Answer 1

這是一種方法。 我們可以使用site標志重塑df_team ，然后從H （home）的角度獲取所有信息，除了您需要在家和離開的信息（ ha_fields ）。 后者為兩個站點保留，並加入到家庭數據中。

ha_fields = ["form_l1_before", "goal_l1_before"]

unstacked_team = df_team.set_index(["index", "site", "date"]).unstack("site")

ha_df = unstacked_team[ha_fields]
ha_df.columns = ha_df.columns.to_flat_index().map(lambda t: "_".join([t[1], t[0]]))

df_final = (
    unstacked_team.swaplevel(axis=1)["H"]
    .drop(ha_fields, axis=1)
    .join(ha_df)
    .reset_index("date")
)

print(df_final)

             date      team  opponent  team_goals  opponent_goals  pts  \
index                                                                    
0      20-10-2020       PSG     Man U           1               2    0   
1      20-10-2020   Leipzig  Istanbul           2               0    3   
2      27-10-2020  Istanbul       PSG           0               2    0   
3      27-10-2020     Man U   Leipzig           5               0    3   

       A_form_l1_before  H_form_l1_before  A_goal_l1_before  H_goal_l1_before  
index                                                                          
0                   NaN               NaN               NaN               NaN  
1                   NaN               NaN               NaN               NaN  
2                   0.0               0.0               1.0               0.0  
3                   3.0               3.0               2.0               2.0

重塑 pandas dataframe，計算鏈接到 2 列

問題描述

1 個解決方案

解決方案1
0 已采納 2020-12-31 10:47:52

重塑 pandas dataframe，計算鏈接到 2 列

問題描述

1 個解決方案

解決方案1 0 已采納 2020-12-31 10:47:52

解決方案1
0 已采納 2020-12-31 10:47:52