簡體   English   中英

重塑 pandas dataframe,計算鏈接到 2 列

[英]Reshape pandas dataframe with calculation linked to 2 columns

從這個 dataframe,我想計算團隊級別的不同統計數據

data = [['20-10-2020', 'PSG', 'Man U', 1, 2], ['20-10-2020', 'Leipzig','Istanbul',2,0], ['27-10-2020', 'Istanbul','PSG',0,2], ['27-10-2020', 'Man U','Leipzig',5,0]] 
df = pd.DataFrame(data, columns = ['Date', 'Home', 'Away', 'HG', 'AG']) 
print(df)

         Date      Home      Away  HG  AG
0  20-10-2020       PSG     Man U   1   2
1  20-10-2020   Leipzig  Istanbul   2   0
2  27-10-2020  Istanbul       PSG   0   2
3  27-10-2020     Man U   Leipzig   5   0

例如,對於每支球隊,我計算上一場比賽的得分和進球數。 簡單的實現創建了兩個數據幀,一個用於主隊,一個用於客隊並將它們連接起來。 我嘗試使用melt但我沒有找到實現我想要的 dataframe 的語法。

df_home = df.reset_index(level=0)
columns = {
    "Date": 'date',
    "Home": "team",
    "Away": "opponent",
    'HG': 'team_goals',
    'AG': 'opponent_goals',
}
df_home = df_home.rename(columns=columns)
df_home['site'] = 'H'

df_away = df.reset_index(level=0)
columns = {
    "Date": 'date',
    "Home": "opponent",
    "Away": "team",
    'HG': 'opponent_goals',
    'AG': 'team_goals',
}
df_away = df_away.rename(columns=columns)
df_away['site'] = 'A'

df_team = pd.concat([df_home, df_away], ignore_index=True).sort_values(['date'])
df_team['team'] = df_team['team'].astype('category')
df_team['opponent'] = df_team['opponent'].astype('category')
print(df_team)

   index        date      team  opponent  team_goals  opponent_goals site
0      0  20-10-2020       PSG     Man U           1               2    H
1      1  20-10-2020   Leipzig  Istanbul           2               0    H
4      0  20-10-2020     Man U       PSG           2               1    A
5      1  20-10-2020  Istanbul   Leipzig           0               2    A
2      2  27-10-2020  Istanbul       PSG           0               2    H
3      3  27-10-2020     Man U   Leipzig           5               0    H
6      2  27-10-2020       PSG  Istanbul           2               0    A
7      3  27-10-2020   Leipzig     Man U           0               5    A    

有了這個 dataframe 我可以根據team列計算統計數據

conditions = [df_team['team_goals'] > df_team['opponent_goals'], df_team['team_goals'] == df_team['opponent_goals']]
choices = [3, 1]
df_team['pts'] = np.select(conditions, choices, default=0)
f = lambda x: x.shift(1).rolling(1).sum()
df_team['form_l1_before'] = df_team.groupby(['team'])['pts'].apply(f)
df_team['goal_l1_before'] = df_team.groupby(['team'])['team_goals'].apply(f)
print(df_team)

   index        date      team  opponent  team_goals  opponent_goals site  \
0      0  20-10-2020       PSG     Man U           1               2    H   
1      1  20-10-2020   Leipzig  Istanbul           2               0    H   
4      0  20-10-2020     Man U       PSG           2               1    A   
5      1  20-10-2020  Istanbul   Leipzig           0               2    A   
2      2  27-10-2020  Istanbul       PSG           0               2    H   
3      3  27-10-2020     Man U   Leipzig           5               0    H   
6      2  27-10-2020       PSG  Istanbul           2               0    A   
7      3  27-10-2020   Leipzig     Man U           0               5    A   

   pts  form_l1_before  goal_l1_before  
0    0             NaN             NaN  
1    3             NaN             NaN  
4    3             NaN             NaN  
5    0             NaN             NaN  
2    0             0.0             0.0  
3    3             3.0             2.0  
6    3             0.0             1.0  
7    0             3.0             2.0

問題是我想將 dataframe 轉換回每場比賽一行(由index列標識),並且每個統計數據都有自己的列

# Ex second game for Istanbul and PSG with stats from the previous game
expected_data = [['27-10-2020', 'Istanbul','PSG',0,2,0,0,0,1]]
df_target = pd.DataFrame(expected_data, columns = ['date', 'Home', 'Away', 'HG', 'AG', 'Home_form_l1_before', 'Home_goal_l1_before', 'Away_form_l1_before', 'Away_goal_l1_before'])
print(df_target)
         date      Home Away  HG  AG  Home_form_l1_before  \
0  27-10-2020  Istanbul  PSG   0   2                    0   

   Home_goal_l1_before  Away_form_l1_before  Away_goal_l1_before  
0                    0                    0                    1  

這是一種方法。 我們可以使用site標志重塑df_team ,然后從H (home)的角度獲取所有信息,除了您需要在家和離開的信息( ha_fields )。 后者為兩個站點保留,並加入到家庭數據中。

ha_fields = ["form_l1_before", "goal_l1_before"]

unstacked_team = df_team.set_index(["index", "site", "date"]).unstack("site")

ha_df = unstacked_team[ha_fields]
ha_df.columns = ha_df.columns.to_flat_index().map(lambda t: "_".join([t[1], t[0]]))

df_final = (
    unstacked_team.swaplevel(axis=1)["H"]
    .drop(ha_fields, axis=1)
    .join(ha_df)
    .reset_index("date")
)

print(df_final)
             date      team  opponent  team_goals  opponent_goals  pts  \
index                                                                    
0      20-10-2020       PSG     Man U           1               2    0   
1      20-10-2020   Leipzig  Istanbul           2               0    3   
2      27-10-2020  Istanbul       PSG           0               2    0   
3      27-10-2020     Man U   Leipzig           5               0    3   

       A_form_l1_before  H_form_l1_before  A_goal_l1_before  H_goal_l1_before  
index                                                                          
0                   NaN               NaN               NaN               NaN  
1                   NaN               NaN               NaN               NaN  
2                   0.0               0.0               1.0               0.0  
3                   3.0               3.0               2.0               2.0  

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM