I'm fairly new with Python and pandas and have a problem I'm not quite sure how to solve. I have a pandas DataFrame that contains hockey players who have played for multiple teams in the same year:
Player Season Team GP G A TP
Player A 2020 A 10 8 3 11
Player A 2020 B 25 10 5 15
Player A 2020 C 6 4 7 11
Player B 2020 A 30 20 6 26
Player B 2020 B 25 18 5 23
I want to be able to combine rows that contain the same player from the same year, and arrange the columns by the team that player played the most for. In the above example all of Team B's numbers would be first because Player A has played the most games for Team B, followed by Team A and then Team C. If a player hasn't played for multiple teams or less than three, I'd like NA to be filled in for the given column.
For example the df above would turn into (Team1 stands for highest team):
Player Season Team1 GP1 G1 A1 TP1 Team2 GP2 G2 A2 TP2 Team3 GP3 G3 A3 TP3
Player A 2020 B 25 10 5 15 A 10 8 3 11 C 6 4 7 11
Player B 2020 A 30 20 6 26 B 25 18 5 23 NA NA NA NA NA
The initial way I can think of attacking this problem is by using a series of groupby max but I'm not sure if that will achieve the desired outcome. Any help would be greatly appreciated!
You could sort, then pivot:
a=(df.sort_values('GP')
.assign(col=df.groupby(['Player','Season']).cumcount()+1)
.pivot_table(index=['Player','Season'], columns='col', aggfunc='first')
)
# rename:
a.columns = [f'{x}{y}' for x,y in a.columns]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.