簡體   English   中英

將分組的熊貓數據應用回原始數據框

[英]Applying grouped pandas data back to the original dataframe

我有下面的數據框,我正在使用:

這些是我試圖按游戲分組的國際象棋游戲,然后根據該游戲中所走的步數在每場比賽中執行一項功能......

        game_id     move_number colour  avg_centi
0       03gDhPWr    1           white   NaN
1       03gDhPWr    2           black   37.0
2       03gDhPWr    3           white   61.0
3       03gDhPWr    4           black   -5.0
4       03gDhPWr    5           white   26.0
5       03gDhPWr    6           black   31.0
6       03gDhPWr    7           white   -2.0
... ... ... ... ...
110091  zzaiRa7s    34          black   NaN
110092  zzaiRa7s    35          white   NaN
110093  zzaiRa7s    36          black   NaN
110094  zzaiRa7s    37          white   NaN
110095  zzaiRa7s    38          black   NaN
110096  zzaiRa7s    39          white   NaN
110097  zzaiRa7s    40          black   NaN

具體來說,我使用pd.cut創建一個新列game_phase ,其中列出了給定的移動是在開局、中局還是終局中進行的。

我正在使用以下代碼來實現這一點。 請注意,每個游戲必須根據該游戲中進行的移動總數划分為openingmiddlegameendgame箱。

def define_move_phase(x):
    bins = (0, round(x['move_number'].max() * 1/3), round(x['move_number'].max() * 2/3), x['move_number'].max())    
    phases = ["opening", "middlegame", "endgame"]
    try:
        x.loc[:, 'phase'] = pd.cut(x['move_number'], bins, labels=phases)
    except ValueError:
        x.loc[:, 'phase'] = None
    print(x)

df.groupby('game_id').apply(define_move_phase)

該函數中的print語句顯示該函數正在處理各個組(見下文),但它不會將phase列應用回原始數據幀。

     game_id  move_number colour  avg_centi    phase
0   03gDhPWr            1  white        NaN  opening
1   03gDhPWr            2  black       37.0  opening
2   03gDhPWr            3  white       61.0  opening
3   03gDhPWr            4  black       -5.0  opening
4   03gDhPWr            5  white       26.0  opening
5   03gDhPWr            6  black       31.0  opening
6   03gDhPWr            7  white       -2.0  opening
..       ...          ...    ...        ...      ...
54  03gDhPWr           55  white       58.0  endgame
55  03gDhPWr           56  black       26.0  endgame
56  03gDhPWr           57  white      116.0  endgame
57  03gDhPWr           58  black     2000.0  endgame
58  03gDhPWr           59  white        0.0  endgame
59  03gDhPWr           60  black        0.0  endgame
60  03gDhPWr           61  white        NaN  endgame

[61 rows x 5 columns]
     game_id  move_number colour  avg_centi    phase
0   03gDhPWr            1  white        NaN  opening
1   03gDhPWr            2  black       37.0  opening
2   03gDhPWr            3  white       61.0  opening
3   03gDhPWr            4  black       -5.0  opening
4   03gDhPWr            5  white       26.0  opening
5   03gDhPWr            6  black       31.0  opening
6   03gDhPWr            7  white       -2.0  opening
..       ...          ...    ...        ...      ...
54  03gDhPWr           55  white       58.0  endgame
55  03gDhPWr           56  black       26.0  endgame
56  03gDhPWr           57  white      116.0  endgame
57  03gDhPWr           58  black     2000.0  endgame
58  03gDhPWr           59  white        0.0  endgame
59  03gDhPWr           60  black        0.0  endgame
60  03gDhPWr           61  white        NaN  endgame

[61 rows x 5 columns]

等等...

我想將新的phase列應用回原始數據幀或再次將分組的數據幀取消組合為一個大數據幀。 這樣做的最佳方法是什么?

您的函數沒有 return 語句

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM