简体   繁体   English

dataframe 通过键分配不同大小的其他 dataframe 的列

[英]dataframe assign columns from other dataframe with different size by key

I have two dataframes:我有两个数据框:

df1 = 
key A  B  C  
r1  1  2  7  
r2  6  3  3  

df2 = 
key A  B  C  D  E  
 r1 1  2  3  4  7
 r1 1  3  2  1  5
 r2 5  7  1  2  2
 r2 6  2  4  9  3
 r3 1  2  7  7  1
 r4 9  0  2  1  2

I want to add a column E to df1, so it will take the value from the first occurence of that key from df2.我想在 df1 中添加一列 E,因此它将从 df2 中第一次出现该键中获取值。

So df1 will be:所以 df1 将是:

df1 = 
 key  A  B  C  E
 r1   1  2  7  7
 r2   6  3  3  2

What is the best way to do so?最好的方法是什么?

Use GroupBy.first with DataFrame.join :使用GroupBy.firstDataFrame.join

df = df1.join(df2.groupby('key')['E'].first(), on='key')
print (df)
  key  A  B  C  E
0  r1  1  2  7  7
1  r2  6  3  3  2

Or DataFrame.drop_duplicates with DataFrame.merge :DataFrame.drop_duplicatesDataFrame.merge

df = df1.merge(df2.drop_duplicates('key')[['key','E']], on='key', how='left')
print (df)
  key  A  B  C  E
0  r1  1  2  7  7
1  r2  6  3  3  2

EDIT:编辑:

If possible column E not exist modify second solution with Index.intersection :如果可能,列E不存在用Index.intersection修改第二个解决方案:

print (df2)
  key  A  B  C  D  E1
0  r1  1  2  3  4   7
1  r1  1  3  2  1   5
2  r2  5  7  1  2   2
3  r2  6  2  4  9   3
4  r3  1  2  7  7   1
5  r4  9  0  2  1   2

cols = ['key'] + df2.columns.intersection(['E']).tolist()
print (cols)
['key']

df = df1.merge(df2.drop_duplicates('key')[cols], on='key', how='left')
print (df)
  key  A  B  C
0  r1  1  2  7
1  r2  6  3  3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM