简体   繁体   中英

Outer Join Pandas Dataframe

I am trying to outer join (on df1) two pandas dataframe. Below are the sample data frames:

df1:
Index   Team 1   Team 2   Team1_Score    Team2_Score
 0       A        B        25              56
 1       B        C        30              55
 2       D        E        35              75

df2:
Index   Team 1   Team 2   Team1_Avg     Team2_Avg
 0       A        B        5              15
 1       G        F        10             25
 2       C        B        15             35

dfcombined
Index   Team 1   Team 2   Team1_Score    Team2_Score    Team2_Avg     Team1_Avg
 0       A        B        25              56           5             15
 1       B        C        30              55           35            15
 2       D        E        35              75        

I was trying to use the pandasql module however I am not sure how to handle the case of joining index 1 in df1 and index 2 at df2 as the order of teams is reversed. Through pandasql module, I am not sure how to switch the Team Avg values in the combined data frame if the order of team is reverse.

I would appreciate any help on this.

Setup -

df1

      Team 1 Team 2  Team1_Score  Team2_Score
Index                                        
0          A      B           25           56
1          B      C           30           55
2          D      E           35           75

df2

      Team 1 Team 2  Team1_Avg  Team2_Avg
Index                                    
0          A      B          5         15
1          F      G         25         10
2          B      C         35         15

First, we'll need to sort the Team * columns, and accordingly sort the Team*_Score columns in the same way. We'll use argsort to do so.

i = np.arange(len(df1))[:, None]
j = np.argsort(df1[['Team 1', 'Team 2']], axis=1).values

df1[['Team 1', 'Team 2']] = df1[['Team 1', 'Team 2']].values[i, j]
df1[['Team1_Score', 'Team2_Score']] = df1[['Team1_Score', 'Team2_Score']].values[i, j]

Now, repeat the same process for df2 , with Team * and Team*_Avg .

j = np.argsort(df2[['Team 1', 'Team 2']], axis=1).values

df2[['Team 1', 'Team 2']] = df2[['Team 1', 'Team 2']].values[i, j]
df2[['Team1_Avg', 'Team2_Avg']] = df2[['Team1_Avg', 'Team2_Avg']].values[i, j]

Now, perform a left outer merge -

df1.merge(df2, on=['Team 1', 'Team 2'], how='left')

  Team 1 Team 2  Team1_Score  Team2_Score Team1_Avg Team2_Avg
0      A      B           25           56         5        15
1      B      C           30           55        35        15
2      D      E           35           75                 

What you can do is duplicate df2 with pd.concat() by flipping the column names. You can do this by setting them with rename

df3 = df2.rename(columns={'Team 1':'Team 2','Team 2':'Team 1', 
        'Team1_Avg':'Team2_Avg','Team2_Avg':'Team1_Avg'})

Now we can do a left merge and concat on both df2 and the newly created df3

df1.merge(pd.concat([df2,df3]),how='left',on=['Team 1','Team 2'])

Which gives you your desired DataFrame

  Team 1 Team 2  Team1_Score  Team2_score  Team1_Avg  Team2_Avg
0      A      B           25           56        5.0       15.0
1      B      C           30           55       35.0       15.0
2      D      E           25           75        NaN        NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM