简体   繁体   中英

dataframe assign columns from other dataframe with different size by key

I have two dataframes:

df1 = 
key A  B  C  
r1  1  2  7  
r2  6  3  3  

df2 = 
key A  B  C  D  E  
 r1 1  2  3  4  7
 r1 1  3  2  1  5
 r2 5  7  1  2  2
 r2 6  2  4  9  3
 r3 1  2  7  7  1
 r4 9  0  2  1  2

I want to add a column E to df1, so it will take the value from the first occurence of that key from df2.

So df1 will be:

df1 = 
 key  A  B  C  E
 r1   1  2  7  7
 r2   6  3  3  2

What is the best way to do so?

Use GroupBy.first with DataFrame.join :

df = df1.join(df2.groupby('key')['E'].first(), on='key')
print (df)
  key  A  B  C  E
0  r1  1  2  7  7
1  r2  6  3  3  2

Or DataFrame.drop_duplicates with DataFrame.merge :

df = df1.merge(df2.drop_duplicates('key')[['key','E']], on='key', how='left')
print (df)
  key  A  B  C  E
0  r1  1  2  7  7
1  r2  6  3  3  2

EDIT:

If possible column E not exist modify second solution with Index.intersection :

print (df2)
  key  A  B  C  D  E1
0  r1  1  2  3  4   7
1  r1  1  3  2  1   5
2  r2  5  7  1  2   2
3  r2  6  2  4  9   3
4  r3  1  2  7  7   1
5  r4  9  0  2  1   2

cols = ['key'] + df2.columns.intersection(['E']).tolist()
print (cols)
['key']

df = df1.merge(df2.drop_duplicates('key')[cols], on='key', how='left')
print (df)
  key  A  B  C
0  r1  1  2  7
1  r2  6  3  3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM