[英]Efficiently applying a function to a grouped pandas DataFrame in parallel
[英]Applying grouped pandas data back to the original dataframe
我有下面的數據框,我正在使用:
這些是我試圖按游戲分組的國際象棋游戲,然后根據該游戲中所走的步數在每場比賽中執行一項功能......
game_id move_number colour avg_centi
0 03gDhPWr 1 white NaN
1 03gDhPWr 2 black 37.0
2 03gDhPWr 3 white 61.0
3 03gDhPWr 4 black -5.0
4 03gDhPWr 5 white 26.0
5 03gDhPWr 6 black 31.0
6 03gDhPWr 7 white -2.0
... ... ... ... ...
110091 zzaiRa7s 34 black NaN
110092 zzaiRa7s 35 white NaN
110093 zzaiRa7s 36 black NaN
110094 zzaiRa7s 37 white NaN
110095 zzaiRa7s 38 black NaN
110096 zzaiRa7s 39 white NaN
110097 zzaiRa7s 40 black NaN
具體來說,我使用pd.cut
創建一個新列game_phase
,其中列出了給定的移動是在開局、中局還是終局中進行的。
我正在使用以下代碼來實現這一點。 請注意,每個游戲必須根據該游戲中進行的移動總數划分為opening
、 middlegame
和endgame
箱。
def define_move_phase(x):
bins = (0, round(x['move_number'].max() * 1/3), round(x['move_number'].max() * 2/3), x['move_number'].max())
phases = ["opening", "middlegame", "endgame"]
try:
x.loc[:, 'phase'] = pd.cut(x['move_number'], bins, labels=phases)
except ValueError:
x.loc[:, 'phase'] = None
print(x)
df.groupby('game_id').apply(define_move_phase)
該函數中的print
語句顯示該函數正在處理各個組(見下文),但它不會將phase
列應用回原始數據幀。
game_id move_number colour avg_centi phase
0 03gDhPWr 1 white NaN opening
1 03gDhPWr 2 black 37.0 opening
2 03gDhPWr 3 white 61.0 opening
3 03gDhPWr 4 black -5.0 opening
4 03gDhPWr 5 white 26.0 opening
5 03gDhPWr 6 black 31.0 opening
6 03gDhPWr 7 white -2.0 opening
.. ... ... ... ... ...
54 03gDhPWr 55 white 58.0 endgame
55 03gDhPWr 56 black 26.0 endgame
56 03gDhPWr 57 white 116.0 endgame
57 03gDhPWr 58 black 2000.0 endgame
58 03gDhPWr 59 white 0.0 endgame
59 03gDhPWr 60 black 0.0 endgame
60 03gDhPWr 61 white NaN endgame
[61 rows x 5 columns]
game_id move_number colour avg_centi phase
0 03gDhPWr 1 white NaN opening
1 03gDhPWr 2 black 37.0 opening
2 03gDhPWr 3 white 61.0 opening
3 03gDhPWr 4 black -5.0 opening
4 03gDhPWr 5 white 26.0 opening
5 03gDhPWr 6 black 31.0 opening
6 03gDhPWr 7 white -2.0 opening
.. ... ... ... ... ...
54 03gDhPWr 55 white 58.0 endgame
55 03gDhPWr 56 black 26.0 endgame
56 03gDhPWr 57 white 116.0 endgame
57 03gDhPWr 58 black 2000.0 endgame
58 03gDhPWr 59 white 0.0 endgame
59 03gDhPWr 60 black 0.0 endgame
60 03gDhPWr 61 white NaN endgame
[61 rows x 5 columns]
等等...
我想將新的phase
列應用回原始數據幀或再次將分組的數據幀取消組合為一個大數據幀。 這樣做的最佳方法是什么?
您的函數沒有 return 語句
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.