[英]How to keep duplicated “on” column value rows in merge_asof in pandas
I have two data frames that i want to merge with the merge_asof function but I am facing a small issue.我有两个数据框,我想与 merge_asof function 合并,但我面临一个小问题。
import pandas as pd
radiology_scan = pd.DataFrame({"patient_id":['a001','a002','a003'],
"Visit_year": [2010, 2012, 2016], "scan_report": [1, 55,2]})
OPD_data = pd.DataFrame({"patient_id":['a001','a001','a002','a003','a004','a005'],
"Visit_year": [2010 , 2010 , 2013, 2017, 2017,2018],
"doctor_comments": ['diagnosis normal, xyz symptoms',
'diagnosis normal, abc symptoms',
'diagnosis abnormal, pqr symptoms',
'diagnosis abnormal, apq symptoms',
'diagnosis normal, xzy symptoms',
'diagnosis abnormal, yzx symptoms' ]})
x = pd.merge_asof(radiology_scan, OPD_data, on = ["Visit_year"], by = ["patient_id"] , direction="nearest")
# when using merge_asof only the first visit of the a001's OPD is seen in the merged data frame
y = pd.merge(radiology_scan, OPD_data, on =["Visit_year","patient_id" ], how = 'left' )
# merge function gives both visits of the matched visit year , but i cannot merge the nearest visits of other patients whose
# visit years from both the data frames don't match, hence I want to use merge_asof
When using the merge_asof function I am unable to retain the value of the 2nd visit of the patient a001 from the OPD-data in the merged data frame, even though the id and the visit year matches.使用 merge_asof function 时,即使 id 和就诊年份匹配,我也无法从合并数据框中的 OPD 数据中保留患者 a001 的第二次就诊值。 The doctor_comment is different and hence i want to retain it in the merged data frame.医生评论是不同的,因此我想将它保留在合并的数据框中。
On the other hand while using the merge function I am able to retain it but I cannot asof for other patient's data to match the nearest visit year.另一方面,在使用合并 function 时,我可以保留它,但我无法确保其他患者的数据与最近的就诊年份相匹配。
Reversing the input order may produce what you want, as merge_asof
is a "left join" operation:颠倒输入顺序可能会产生您想要的结果,因为merge_asof
是“左连接”操作:
x = pd.merge_asof(OPD_data, radiology_scan, on=["Visit_year"], by=["patient_id"], direction="nearest")
output: output:
patient_id Visit_year doctor_comments scan_report
0 a001 2010 diagnosis normal, xyz symptoms 1.0
1 a001 2010 diagnosis normal, abc symptoms 1.0
2 a002 2013 diagnosis abnormal, pqr symptoms 55.0
3 a003 2017 diagnosis abnormal, apq symptoms 2.0
4 a004 2017 diagnosis normal, xzy symptoms NaN
5 a005 2018 diagnosis abnormal, yzx symptoms NaN
Note that Visit_year
is taken from OPD_data
in this case.请注意,在这种情况下, Visit_year
取自OPD_data
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.