[英]dataframe assign columns from other dataframe with different size by key
I have two dataframes:我有两个数据框:
df1 =
key A B C
r1 1 2 7
r2 6 3 3
df2 =
key A B C D E
r1 1 2 3 4 7
r1 1 3 2 1 5
r2 5 7 1 2 2
r2 6 2 4 9 3
r3 1 2 7 7 1
r4 9 0 2 1 2
I want to add a column E to df1, so it will take the value from the first occurence of that key from df2.我想在 df1 中添加一列 E,因此它将从 df2 中第一次出现该键中获取值。
So df1 will be:所以 df1 将是:
df1 =
key A B C E
r1 1 2 7 7
r2 6 3 3 2
What is the best way to do so?最好的方法是什么?
Use GroupBy.first
with DataFrame.join
:使用
GroupBy.first
和DataFrame.join
:
df = df1.join(df2.groupby('key')['E'].first(), on='key')
print (df)
key A B C E
0 r1 1 2 7 7
1 r2 6 3 3 2
Or DataFrame.drop_duplicates
with DataFrame.merge
:或
DataFrame.drop_duplicates
与DataFrame.merge
:
df = df1.merge(df2.drop_duplicates('key')[['key','E']], on='key', how='left')
print (df)
key A B C E
0 r1 1 2 7 7
1 r2 6 3 3 2
EDIT:编辑:
If possible column E
not exist modify second solution with Index.intersection
:如果可能,列
E
不存在用Index.intersection
修改第二个解决方案:
print (df2)
key A B C D E1
0 r1 1 2 3 4 7
1 r1 1 3 2 1 5
2 r2 5 7 1 2 2
3 r2 6 2 4 9 3
4 r3 1 2 7 7 1
5 r4 9 0 2 1 2
cols = ['key'] + df2.columns.intersection(['E']).tolist()
print (cols)
['key']
df = df1.merge(df2.drop_duplicates('key')[cols], on='key', how='left')
print (df)
key A B C
0 r1 1 2 7
1 r2 6 3 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.