简体   繁体   English

将分组的熊猫数据应用回原始数据框

[英]Applying grouped pandas data back to the original dataframe

I have the dataframe below that I am working with:我有下面的数据框,我正在使用:

These are chess games which I am trying to group by game and then perform a function on each game based on the number of moves played in that game...这些是我试图按游戏分组的国际象棋游戏,然后根据该游戏中所走的步数在每场比赛中执行一项功能......

        game_id     move_number colour  avg_centi
0       03gDhPWr    1           white   NaN
1       03gDhPWr    2           black   37.0
2       03gDhPWr    3           white   61.0
3       03gDhPWr    4           black   -5.0
4       03gDhPWr    5           white   26.0
5       03gDhPWr    6           black   31.0
6       03gDhPWr    7           white   -2.0
... ... ... ... ...
110091  zzaiRa7s    34          black   NaN
110092  zzaiRa7s    35          white   NaN
110093  zzaiRa7s    36          black   NaN
110094  zzaiRa7s    37          white   NaN
110095  zzaiRa7s    38          black   NaN
110096  zzaiRa7s    39          white   NaN
110097  zzaiRa7s    40          black   NaN

Specifically, I am using pd.cut to create a new column, game_phase , which lists whether the given move was played in the opening, middlegame, or endgame.具体来说,我使用pd.cut创建一个新列game_phase ,其中列出了给定的移动是在开局、中局还是终局中进行的。

I'm using the following code to achieve this.我正在使用以下代码来实现这一点。 Note that each game must be partitioned into opening , middlegame , and endgame bins based on the total number of moves played in that game.请注意,每个游戏必须根据该游戏中进行的移动总数划分为openingmiddlegameendgame箱。

def define_move_phase(x):
    bins = (0, round(x['move_number'].max() * 1/3), round(x['move_number'].max() * 2/3), x['move_number'].max())    
    phases = ["opening", "middlegame", "endgame"]
    try:
        x.loc[:, 'phase'] = pd.cut(x['move_number'], bins, labels=phases)
    except ValueError:
        x.loc[:, 'phase'] = None
    print(x)

df.groupby('game_id').apply(define_move_phase)

The print statement in that function shows that the function is working on the individual groups (see below) but it does not apply the phase column back to the original dataframe.该函数中的print语句显示该函数正在处理各个组(见下文),但它不会将phase列应用回原始数据帧。

     game_id  move_number colour  avg_centi    phase
0   03gDhPWr            1  white        NaN  opening
1   03gDhPWr            2  black       37.0  opening
2   03gDhPWr            3  white       61.0  opening
3   03gDhPWr            4  black       -5.0  opening
4   03gDhPWr            5  white       26.0  opening
5   03gDhPWr            6  black       31.0  opening
6   03gDhPWr            7  white       -2.0  opening
..       ...          ...    ...        ...      ...
54  03gDhPWr           55  white       58.0  endgame
55  03gDhPWr           56  black       26.0  endgame
56  03gDhPWr           57  white      116.0  endgame
57  03gDhPWr           58  black     2000.0  endgame
58  03gDhPWr           59  white        0.0  endgame
59  03gDhPWr           60  black        0.0  endgame
60  03gDhPWr           61  white        NaN  endgame

[61 rows x 5 columns]
     game_id  move_number colour  avg_centi    phase
0   03gDhPWr            1  white        NaN  opening
1   03gDhPWr            2  black       37.0  opening
2   03gDhPWr            3  white       61.0  opening
3   03gDhPWr            4  black       -5.0  opening
4   03gDhPWr            5  white       26.0  opening
5   03gDhPWr            6  black       31.0  opening
6   03gDhPWr            7  white       -2.0  opening
..       ...          ...    ...        ...      ...
54  03gDhPWr           55  white       58.0  endgame
55  03gDhPWr           56  black       26.0  endgame
56  03gDhPWr           57  white      116.0  endgame
57  03gDhPWr           58  black     2000.0  endgame
58  03gDhPWr           59  white        0.0  endgame
59  03gDhPWr           60  black        0.0  endgame
60  03gDhPWr           61  white        NaN  endgame

[61 rows x 5 columns]

etc...等等...

I would like to apply the new phase columns back to the original dataframe or ungroup the grouped dataframes into one big dataframe again.我想将新的phase列应用回原始数据帧或再次将分组的数据帧取消组合为一个大数据帧。 What is the best way to go about doing that?这样做的最佳方法是什么?

您的函数没有 return 语句

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM