Join two data frame with two columns values of a df with a single column values of another dataframe. based on some conditions?

Question

I have a dataframe like this:

df1
col1       col2      col3      col4
 1           2        A         S
 3           4        A         P
 5           6        B         R
 7           8        B         B

I have another data frame:

df2
col5      col6         col3
 9         10           A
 11        12           R

I want to join these two data frame if any value of col3 and col4 of df1 matches with col3 values of df2 it will join.

the final data frame will look like:

df3
col1    col2    col3    col5   col6
 1       2       A       9      10
 3       4       A       9      10
 5       6       R       11     12

If col3 value presents in df2 then it will join via col3 values else it will join via col4 values if it presents in col3 values of df2

How to do this in most efficient way using pandas/python?

Answer 1

Use double merge with default inner join, for second filter out rows matched in df3 , last concat together:

df3 = df1.drop('col4', axis=1).merge(df2, on='col3')
df4 = (df1.drop('col3', axis=1).rename(columns={'col4':'col3'})
            .merge(df2[~df2['col3'].isin(df1['col3'])], on='col3'))


df = pd.concat([df3, df4],ignore_index=True)
print (df)
   col1  col2 col3  col5  col6
0     1     2    A     9    10
1     3     4    A     9    10
2     5     6    R    11    12

EDIT: Use left join and last combine_first :

df3 = df1.drop('col4', axis=1).merge(df2, on='col3', how='left')
df4 = (df1.drop('col3', axis=1).rename(columns={'col4':'col3'})
            .merge(df2, on='col3', how='left'))

df = df3.combine_first(df4)
print (df)
   col1  col2 col3  col5  col6
0     1     2    A   9.0  10.0
1     3     4    A   9.0  10.0
2     5     6    B  11.0  12.0
3     7     8    B   NaN   NaN

Join two data frame with two columns values of a df with a single column values of another dataframe. based on some conditions?

Question

1 answers

solution1
1 ACCPTED 2019-02-20 06:18:47

Join two data frame with two columns values of a df with a single column values of another dataframe. based on some conditions?

Question

1 answers

solution1 1 ACCPTED 2019-02-20 06:18:47

solution1
1 ACCPTED 2019-02-20 06:18:47