简体   繁体   中英

Combining 2 dataframe using Inner join and taking last observation only in Python

I have 2 dataframes and i want to join them on 2 columns and get only the record if more than 1 record is present for that inner joins

DF1
在此处输入图像描述

DF2
在此处输入图像描述

When i combine both Dataframe using Inner Join on 'Patient_id' and 'diag_date', I get

在此处输入图像描述

I want only idx '934814' of DF1 -> Nasal Steroids to map against '42775' of DF2, and not with any other indexes I dont want to groupby patient_id, and take the last record., it is required while merging the 2 tables. I want only the last row in inner join instead of it applying on all. Can you guys please suggest some solutions!

Thanks a lot!

Use DataFrame.drop_duplicates with keep='last' and columns used for join before DataFrame.merge :

df = (DF1.drop_duplicates(['Patient_id','Prescription_date'], keep='last')
         .merge(DF2, on=['Patient_id','Prescription_date']))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM