I have two dataframes as such:
UID mainColumn .... (other columns of data)
1 apple
2 orange
3 apple
4 orange
5 berry
....
UID2 mainColumn2
1 truck
3 car
4 boat
5 plane
...
I need to join the second dataframe onto dataframe based on UID, however if df2 does not contain a uid, then the maincolumn value is the one I'd to use. In the above example, UID2 does not contain the value 2, so the final table would look something like
UID mainColumn ....
1 truck
2 orange
3 car
4 boat
5 plane
...
Now I'm aware we can do something in the form of
df1=df1.merge(df2,left_on='UID', right_on='UID2')
But the issue I have is not replacing the missing values, and making sure they are still included. Thanks!
You can use combine_first()
after renaming the columns of df2
as df1
(eg UID2 to UID..) :
df2.columns=df1.columns#be careful, rename only matching columns
final_df=df2.set_index('UID').combine_first(df1.set_index('UID')).reset_index()
UID mainColumn
0 1 truck
1 2 orange
2 3 car
3 4 boat
4 5 plane
We can first use merge
, then fillna
the missing values and finally drop
the extra column:
final = df1.merge(df2, left_on='UID', right_on='UID2', how='left').drop('UID2', axis=1)
final['mainColumn'] = final['mainColumn2'].fillna(final['mainColumn'])
final.drop('mainColumn2', axis=1, inplace=True)
UID mainColumn
0 1 truck
1 2 orange
2 3 car
3 4 boat
4 5 plane
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.