I am trying to outer join (on df1) two pandas dataframe. Below are the sample data frames:
df1:
Index Team 1 Team 2 Team1_Score Team2_Score
0 A B 25 56
1 B C 30 55
2 D E 35 75
df2:
Index Team 1 Team 2 Team1_Avg Team2_Avg
0 A B 5 15
1 G F 10 25
2 C B 15 35
dfcombined
Index Team 1 Team 2 Team1_Score Team2_Score Team2_Avg Team1_Avg
0 A B 25 56 5 15
1 B C 30 55 35 15
2 D E 35 75
I was trying to use the pandasql module however I am not sure how to handle the case of joining index 1 in df1 and index 2 at df2 as the order of teams is reversed. Through pandasql module, I am not sure how to switch the Team Avg values in the combined data frame if the order of team is reverse.
I would appreciate any help on this.
Setup -
df1
Team 1 Team 2 Team1_Score Team2_Score
Index
0 A B 25 56
1 B C 30 55
2 D E 35 75
df2
Team 1 Team 2 Team1_Avg Team2_Avg
Index
0 A B 5 15
1 F G 25 10
2 B C 35 15
First, we'll need to sort the Team *
columns, and accordingly sort the Team*_Score
columns in the same way. We'll use argsort
to do so.
i = np.arange(len(df1))[:, None]
j = np.argsort(df1[['Team 1', 'Team 2']], axis=1).values
df1[['Team 1', 'Team 2']] = df1[['Team 1', 'Team 2']].values[i, j]
df1[['Team1_Score', 'Team2_Score']] = df1[['Team1_Score', 'Team2_Score']].values[i, j]
Now, repeat the same process for df2
, with Team *
and Team*_Avg
.
j = np.argsort(df2[['Team 1', 'Team 2']], axis=1).values
df2[['Team 1', 'Team 2']] = df2[['Team 1', 'Team 2']].values[i, j]
df2[['Team1_Avg', 'Team2_Avg']] = df2[['Team1_Avg', 'Team2_Avg']].values[i, j]
Now, perform a left outer merge
-
df1.merge(df2, on=['Team 1', 'Team 2'], how='left')
Team 1 Team 2 Team1_Score Team2_Score Team1_Avg Team2_Avg
0 A B 25 56 5 15
1 B C 30 55 35 15
2 D E 35 75
What you can do is duplicate df2 with pd.concat()
by flipping the column names. You can do this by setting them with rename
df3 = df2.rename(columns={'Team 1':'Team 2','Team 2':'Team 1',
'Team1_Avg':'Team2_Avg','Team2_Avg':'Team1_Avg'})
Now we can do a left merge
and concat
on both df2 and the newly created df3
df1.merge(pd.concat([df2,df3]),how='left',on=['Team 1','Team 2'])
Which gives you your desired DataFrame
Team 1 Team 2 Team1_Score Team2_score Team1_Avg Team2_Avg
0 A B 25 56 5.0 15.0
1 B C 30 55 35.0 15.0
2 D E 25 75 NaN NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.