简体   繁体   English

如何在 pandas df 中每隔一行创建新列

[英]How to make new columns out of every second row in a pandas df

I have a data frame for NBA data that I am having a hard time manipulating.我有一个很难处理的 NBA 数据数据框。 I would like to change df1 to df2 by having both teams and their scores in a game along the same row twice to resemble the games outcome from both teams' standpoints:我想将 df1 更改为 df2 通过让两支球队及其在同一行的比赛中的得分两次以从两支球队的角度来看比赛结果:

df1

GameID     TeamID     TeamAbb     PTS
   0        1001        TOR        99
   0        1023        ATL        86
   1        1004        DAL        102
   1        1003        POR        100
   2        1015        LAL        96
   2        1029        MIL        85

df2

GameID     Team1ID     Team2ID     Team1Abb      Team2Abb    Team1PTS    Team2PTS
   0        1001        1023         TOR           ATL          99          86
   0        1023        1001         ATL           TOR          86          99
   1        1004        1003         DAL           POR          102         100
   1        1003        1004         POR           DAL          100         102

So in essence, a sort of widening of the data frame.所以本质上是一种数据框的扩展。

Try:尝试:

df2 = df1.set_index(['GameID', df1.groupby('GameID').cumcount()+1]).unstack()
df2.columns=[f'{i}_{j}' for i, j in df2.columns]
df2.reset_index()

Output: Output:

  GameID  TeamID_1  TeamID_2 TeamAbb_1 TeamAbb_2  PTS_1  PTS_2
0       0      1001      1023       TOR       ATL     99     86
1       1      1004      1003       DAL       POR    102    100
2       2      1015      1029       LAL       MIL     96     85

Details:细节:

  • Use 'GameID' to groupby and cumcount getting 1 and 2.使用 ' groupby ' 分组和cumcount获得 1 和 2。
  • Then, flatten multiindex column headers created by groupby using list comprehension然后,使用列表推导展平由 groupby 创建的多索引列标题
  • Lastly, reset_index最后, reset_index

Update per comment below:更新以下评论:

#Create home team and visiting team records
g = df.groupby('GameID').cumcount()
dfh = df.set_index(['GameID', g + 1])
dfv = df.set_index(['GameID', 2 - g])

dfh = dfh.unstack()
dfh.columns = [f'{i}_{j}' for i, j in dfh.columns]

dfv = dfv.unstack()
dfv.columns = [f'{i}_{j}' for i, j in dfv.columns]

# concatenate home and visiting records
pd.concat([dfh, dfv]).sort_index().reset_index()

Output: Output:

   GameID  TeamID_1  TeamID_2 TeamAbb_1 TeamAbb_2  PTS_1  PTS_2
0       0      1001      1023       TOR       ATL     99     86
1       0      1023      1001       ATL       TOR     86     99
2       1      1004      1003       DAL       POR    102    100
3       1      1003      1004       POR       DAL    100    102
4       2      1015      1029       LAL       MIL     96     85
5       2      1029      1015       MIL       LAL     85     96

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM