简体   繁体   English

如何比较列不唯一的两个不同数据框的两个列?

[英]How to compare two columns of two different dataframes whose columns are not unique?

I'm having two different dataframes : df1 and df2 我有两个不同的数据框:df1和df2

df1 :                                
    Id lkey                           
0  foo  foo                        
1  bar  bar                        
2  baz  baz                        
3  foo  foo                        
4  bar  bar                        
5  foo  foo                        
6  bar  bar
7  bar  bar
8  bar  bar

df2 :
    e rkey value    y
0  aaa  foo   aaa  foo
1  NaN  bar   bbb  bar
2  ccc  baz   ccc  baz
3  NaN  mac   ddd  fff
4  NaN  xyz   eee  mmm
5  NaN  mnb   fff  NaN
6  NaN  foo   aaa  NaN

Edit1 : added 6th row as a duplicate. Edit1:添加了第六行作为重复项。

I want perform one task on this dataframes. 我想在此数据帧上执行一项任务。 I want to compare lkey and rkey columns. 我想比较lkey和rkey列。

Edit2 : 编辑2:

Note : lkey column contains all duplicate values and rkey column contains some duplicate values. 注意: lkey列包含所有重复值,而rkey列包含一些重复值。

Pick up first value of lkey column ie foo compare this value with values of rkey column of dataframe. 拾取lkey列的第一个值,即foo,将该值与数据帧的rkey列的值进行比较。 If match is find I want to know this row's value of value column in the df1 dataframe column name as match. 如果找到匹配项,我想知道df1数据帧列名称中该行的value列的值是否匹配。 (In every case match will get for lkey and rkey ie whatever lkey values present in df1 available in the rkey column of df2.) (在每种情况下,lkey和rkey都会匹配,即df1的rkey值可在df2的rkey列中找到。)

I'm already tried with merge . 我已经尝试过merge

result = df1.merge(df2, left_on='lkey', right_on='rkey', how='outer')

output : 输出:

     Id lkey    e rkey value    y
0   foo  foo  aaa  foo   aaa  foo
1   foo  foo  aaa  foo   aaa  foo
2   foo  foo  aaa  foo   aaa  foo
3   bar  bar  NaN  bar   bbb  bar
4   bar  bar  NaN  bar   bbb  bar
5   bar  bar  NaN  bar   bbb  bar
6   bar  bar  NaN  bar   bbb  bar
7   bar  bar  NaN  bar   bbb  bar
8   baz  baz  ccc  baz   ccc  baz
9   NaN  NaN  NaN  mac   ddd  fff
10  NaN  NaN  NaN  xyz   eee  mmm
11  NaN  NaN  NaN  mnb   fff  NaN

I don't want 11 rows. 我不要11行。 In my df1 only 9 rows are available with column Id and lkey. 在我的df1中,Id和lkey列只有9行可用。 I just want to add match column with specific mapping. 我只想添加具有特定映射的匹配列。

Expected Output : 预期产量:

Id  lkey match
0  foo  foo  aaa
1  bar  bar  bbb
2  baz  baz  ccc
3  foo  foo  aaa
4  bar  bar  bbb
5  foo  foo  aaa
6  bar  bar  bbb
7  bar  bar  bbb
8  bar  bar  bbb

How I can achieve what I want to do? 我如何实现自己想做的事?

Edit : previously I was saying rkey column contains unique values but I'm facing issue because of that only, rkey column contains duplicate values. 编辑:以前我说过rkey列包含唯一值,但是我正因为这个原因,rkey列包含重复值。

>>> (df1
     .merge(df2[['rkey', 'value']].drop_duplicates(), left_on='lkey', right_on='rkey', how='left')
     .drop('rkey', axis='columns')
     .rename(columns={'value': 'match'})
    )
    Id lkey match
0  foo  foo   aaa
1  bar  bar   bbb
2  baz  baz   ccc
3  foo  foo   aaa
4  bar  bar   bbb
5  foo  foo   aaa
6  bar  bar   bbb
7  bar  bar   bbb
8  bar  bar   bbb

If the key column in both dataframes had the same name, you can just use on='key' and wouldn't need to drop the right key. 如果两个数据框中的键列具有相同的名称,则只需使用on='key' ,而无需删除正确的键。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM