简体   繁体   中英

Joining two dataframes on unique ID, but using another value if id doesn't exist

I have two dataframes as such:

UID    mainColumn .... (other columns of data)
1      apple
2      orange
3      apple
4      orange
5      berry
....

UID2   mainColumn2
1      truck
3      car
4      boat
5      plane
...

I need to join the second dataframe onto dataframe based on UID, however if df2 does not contain a uid, then the maincolumn value is the one I'd to use. In the above example, UID2 does not contain the value 2, so the final table would look something like

UID    mainColumn ....
1      truck
2      orange
3      car
4      boat
5      plane
...

Now I'm aware we can do something in the form of

df1=df1.merge(df2,left_on='UID', right_on='UID2')

But the issue I have is not replacing the missing values, and making sure they are still included. Thanks!

You can use combine_first() after renaming the columns of df2 as df1 (eg UID2 to UID..) :

df2.columns=df1.columns#be careful, rename only matching columns
final_df=df2.set_index('UID').combine_first(df1.set_index('UID')).reset_index()

  UID mainColumn
0    1      truck
1    2     orange
2    3        car
3    4       boat
4    5      plane

We can first use merge , then fillna the missing values and finally drop the extra column:

final = df1.merge(df2, left_on='UID', right_on='UID2', how='left').drop('UID2', axis=1)

final['mainColumn'] = final['mainColumn2'].fillna(final['mainColumn'])

final.drop('mainColumn2', axis=1, inplace=True)

   UID mainColumn
0    1      truck
1    2     orange
2    3        car
3    4       boat
4    5      plane

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM