简体   繁体   中英

How to compare two columns of two different dataframes whose columns are not unique?

I'm having two different dataframes : df1 and df2

df1 :                                
    Id lkey                           
0  foo  foo                        
1  bar  bar                        
2  baz  baz                        
3  foo  foo                        
4  bar  bar                        
5  foo  foo                        
6  bar  bar
7  bar  bar
8  bar  bar

df2 :
    e rkey value    y
0  aaa  foo   aaa  foo
1  NaN  bar   bbb  bar
2  ccc  baz   ccc  baz
3  NaN  mac   ddd  fff
4  NaN  xyz   eee  mmm
5  NaN  mnb   fff  NaN
6  NaN  foo   aaa  NaN

Edit1 : added 6th row as a duplicate.

I want perform one task on this dataframes. I want to compare lkey and rkey columns.

Edit2 :

Note : lkey column contains all duplicate values and rkey column contains some duplicate values.

Pick up first value of lkey column ie foo compare this value with values of rkey column of dataframe. If match is find I want to know this row's value of value column in the df1 dataframe column name as match. (In every case match will get for lkey and rkey ie whatever lkey values present in df1 available in the rkey column of df2.)

I'm already tried with merge .

result = df1.merge(df2, left_on='lkey', right_on='rkey', how='outer')

output :

     Id lkey    e rkey value    y
0   foo  foo  aaa  foo   aaa  foo
1   foo  foo  aaa  foo   aaa  foo
2   foo  foo  aaa  foo   aaa  foo
3   bar  bar  NaN  bar   bbb  bar
4   bar  bar  NaN  bar   bbb  bar
5   bar  bar  NaN  bar   bbb  bar
6   bar  bar  NaN  bar   bbb  bar
7   bar  bar  NaN  bar   bbb  bar
8   baz  baz  ccc  baz   ccc  baz
9   NaN  NaN  NaN  mac   ddd  fff
10  NaN  NaN  NaN  xyz   eee  mmm
11  NaN  NaN  NaN  mnb   fff  NaN

I don't want 11 rows. In my df1 only 9 rows are available with column Id and lkey. I just want to add match column with specific mapping.

Expected Output :

Id  lkey match
0  foo  foo  aaa
1  bar  bar  bbb
2  baz  baz  ccc
3  foo  foo  aaa
4  bar  bar  bbb
5  foo  foo  aaa
6  bar  bar  bbb
7  bar  bar  bbb
8  bar  bar  bbb

How I can achieve what I want to do?

Edit : previously I was saying rkey column contains unique values but I'm facing issue because of that only, rkey column contains duplicate values.

>>> (df1
     .merge(df2[['rkey', 'value']].drop_duplicates(), left_on='lkey', right_on='rkey', how='left')
     .drop('rkey', axis='columns')
     .rename(columns={'value': 'match'})
    )
    Id lkey match
0  foo  foo   aaa
1  bar  bar   bbb
2  baz  baz   ccc
3  foo  foo   aaa
4  bar  bar   bbb
5  foo  foo   aaa
6  bar  bar   bbb
7  bar  bar   bbb
8  bar  bar   bbb

If the key column in both dataframes had the same name, you can just use on='key' and wouldn't need to drop the right key.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM