[英]How to make new columns out of every second row in a pandas df
I have a data frame for NBA data that I am having a hard time manipulating.我有一个很难处理的 NBA 数据数据框。 I would like to change df1 to df2 by having both teams and their scores in a game along the same row twice to resemble the games outcome from both teams' standpoints:
我想将 df1 更改为 df2 通过让两支球队及其在同一行的比赛中的得分两次以从两支球队的角度来看比赛结果:
df1
GameID TeamID TeamAbb PTS
0 1001 TOR 99
0 1023 ATL 86
1 1004 DAL 102
1 1003 POR 100
2 1015 LAL 96
2 1029 MIL 85
df2
GameID Team1ID Team2ID Team1Abb Team2Abb Team1PTS Team2PTS
0 1001 1023 TOR ATL 99 86
0 1023 1001 ATL TOR 86 99
1 1004 1003 DAL POR 102 100
1 1003 1004 POR DAL 100 102
So in essence, a sort of widening of the data frame.所以本质上是一种数据框的扩展。
Try:尝试:
df2 = df1.set_index(['GameID', df1.groupby('GameID').cumcount()+1]).unstack()
df2.columns=[f'{i}_{j}' for i, j in df2.columns]
df2.reset_index()
Output: Output:
GameID TeamID_1 TeamID_2 TeamAbb_1 TeamAbb_2 PTS_1 PTS_2
0 0 1001 1023 TOR ATL 99 86
1 1 1004 1003 DAL POR 102 100
2 2 1015 1029 LAL MIL 96 85
Details:细节:
groupby
and cumcount
getting 1 and 2.groupby
' 分组和cumcount
获得 1 和 2。reset_index
reset_index
#Create home team and visiting team records
g = df.groupby('GameID').cumcount()
dfh = df.set_index(['GameID', g + 1])
dfv = df.set_index(['GameID', 2 - g])
dfh = dfh.unstack()
dfh.columns = [f'{i}_{j}' for i, j in dfh.columns]
dfv = dfv.unstack()
dfv.columns = [f'{i}_{j}' for i, j in dfv.columns]
# concatenate home and visiting records
pd.concat([dfh, dfv]).sort_index().reset_index()
Output: Output:
GameID TeamID_1 TeamID_2 TeamAbb_1 TeamAbb_2 PTS_1 PTS_2
0 0 1001 1023 TOR ATL 99 86
1 0 1023 1001 ATL TOR 86 99
2 1 1004 1003 DAL POR 102 100
3 1 1003 1004 POR DAL 100 102
4 2 1015 1029 LAL MIL 96 85
5 2 1029 1015 MIL LAL 85 96
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.