简体   繁体   中英

Merge DataFrames with different number of rows and have a new column with sum of values

I have this df1:

df1 = pd.DataFrame({'Player':['Zico', 'Leonidas', 'Didi'],
                    'Team': ['Flamengo', 'Flamengo', 'Botafogo'],
                    'Position': ['MID', 'DEF', 'MID'],
                    'Games_Away': [4, 4, 4]})

And another df2 with a different numbers of rows, where all players on the first df are present.

df2 = pd.DataFrame({'Player':['Zico', 'Leonidas', 'Didi', 'Gerson', 'Pele'],
                    'Team': ['Flamengo', 'Flamengo', 'Botafogo', 'Botafogo', 'Santos'],
                    'Position': ['MID', 'DEF', 'MID', 'MID', 'FWD'],
                    'Games_Home': [3, 4, 3, 1, 1]})

How do I merge these two dfs, in order to end up with a new column 'Total_Games' that will add up correctly?

      Player  Team  Position  Games_Home  Games_Away Total_Games
0      Zico  Flamengo      MID      3       4         7
1  Leonidas  Flamengo      DEF      4       4         8
2      Didi  Botafogo      MID      3       4         7
3    Gerson  Botafogo      MID      1       0         1
4      Pele    Santos      FWD      1       0         1

I have tried:

df_merge = df1.merge(df2, on="Player", how = 'inner')
df_merge['Total_Games']= df1['Games_Away'] + df2['Games_Home']

But this gives me:

     Player    Team_x Position_x  Games_Away    Team_y Position_y  Games_Home  Total_Games
0      Zico  Flamengo        MID           4  Flamengo        MID           3          7.0
1  Leonidas  Flamengo        DEF           4  Flamengo        DEF           4          8.0
2      Didi  Botafogo        MID           4  Botafogo        MID           3          7.0

Problems:

  • df2 players are not added
  • 'Position' and 'Team' are repeated

What is the best way of joining the dfs, keeping all players, not having duplicate columns and summing correctly 'Games_Away' + 'Games_Home', given the fact that 1 + 'NaN' must be 1?

Do outer merge

df=df1.merge(df2,on=['Player','Team','Position'],how='outer').fillna(0)
df['Game_total']=df.Games_Away+df.Games_Home
 
df
Out[241]: 
     Player      Team Position  Games_Away  Games_Home  Game_total
0      Zico  Flamengo      MID         4.0           3         7.0
1  Leonidas  Flamengo      DEF         4.0           4         8.0
2      Didi  Botafogo      MID         4.0           3         7.0
3    Gerson  Botafogo      MID         0.0           1         1.0
4      Pele    Santos      FWD         0.0           1         1.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM