简体   繁体   English

基于 Groupby Max 将多个 Pandas 行转为列

[英]Pivot Multiple Pandas Rows into Columns Based on Groupby Max

I'm fairly new with Python and pandas and have a problem I'm not quite sure how to solve.我对 Python 和 Pandas 还很陌生,遇到了一个我不太确定如何解决的问题。 I have a pandas DataFrame that contains hockey players who have played for multiple teams in the same year:我有一个 Pandas DataFrame,其中包含在同一年为多支球队效力的曲棍球运动员:

Player         Season      Team      GP        G      A       TP      
Player A        2020        A        10        8      3       11
Player A        2020        B        25        10     5       15
Player A        2020        C        6         4      7       11
Player B        2020        A        30        20     6       26
Player B        2020        B        25        18     5       23

I want to be able to combine rows that contain the same player from the same year, and arrange the columns by the team that player played the most for.我希望能够组合包含同一年同一球员的行,并按球员效力最多的球队排列列。 In the above example all of Team B's numbers would be first because Player A has played the most games for Team B, followed by Team A and then Team C. If a player hasn't played for multiple teams or less than three, I'd like NA to be filled in for the given column.在上面的例子中,所有 B 队的数字都将排在第一位,因为球员 A 为 B 队打了最多的比赛,其次是 A 队,然后是 C 队。如果一名球员没有为多支球队效力或少于三支球队,我' d 希望为给定的列填写 NA。

For example the df above would turn into (Team1 stands for highest team):例如上面的 df 会变成(Team1 代表最高团队):

Player        Season      Team1      GP1    G1      A1     TP1     Team2      GP2        G2      A2       TP2    Team3    GP3   G3   A3  TP3
Player A      2020          B        25     10      5      15       A         10         8       3        11       C       6     4   7    11
Player B      2020          A        30     20      6      26       B         25         18      5        23       NA     NA     NA  NA   NA

The initial way I can think of attacking this problem is by using a series of groupby max but I'm not sure if that will achieve the desired outcome.我能想到的解决这个问题的最初方法是使用一系列 groupby max 但我不确定这是否会达到预期的结果。 Any help would be greatly appreciated!任何帮助将不胜感激!

You could sort, then pivot:你可以排序,然后旋转:

a=(df.sort_values('GP')
   .assign(col=df.groupby(['Player','Season']).cumcount()+1)
   .pivot_table(index=['Player','Season'], columns='col', aggfunc='first')
)

# rename:
a.columns = [f'{x}{y}' for x,y in a.columns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM