I have two dataframes:
df1 =
key A B C
r1 1 2 7
r2 6 3 3
df2 =
key A B C D E
r1 1 2 3 4 7
r1 1 3 2 1 5
r2 5 7 1 2 2
r2 6 2 4 9 3
r3 1 2 7 7 1
r4 9 0 2 1 2
I want to add a column E to df1, so it will take the value from the first occurence of that key from df2.
So df1 will be:
df1 =
key A B C E
r1 1 2 7 7
r2 6 3 3 2
What is the best way to do so?
Use GroupBy.first
with DataFrame.join
:
df = df1.join(df2.groupby('key')['E'].first(), on='key')
print (df)
key A B C E
0 r1 1 2 7 7
1 r2 6 3 3 2
Or DataFrame.drop_duplicates
with DataFrame.merge
:
df = df1.merge(df2.drop_duplicates('key')[['key','E']], on='key', how='left')
print (df)
key A B C E
0 r1 1 2 7 7
1 r2 6 3 3 2
EDIT:
If possible column E
not exist modify second solution with Index.intersection
:
print (df2)
key A B C D E1
0 r1 1 2 3 4 7
1 r1 1 3 2 1 5
2 r2 5 7 1 2 2
3 r2 6 2 4 9 3
4 r3 1 2 7 7 1
5 r4 9 0 2 1 2
cols = ['key'] + df2.columns.intersection(['E']).tolist()
print (cols)
['key']
df = df1.merge(df2.drop_duplicates('key')[cols], on='key', how='left')
print (df)
key A B C
0 r1 1 2 7
1 r2 6 3 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.