简体   繁体   English

如何在 pandas 的 merge_asof 中保留重复的“on”列值行

[英]How to keep duplicated “on” column value rows in merge_asof in pandas

I have two data frames that i want to merge with the merge_asof function but I am facing a small issue.我有两个数据框,我想与 merge_asof function 合并,但我面临一个小问题。

import pandas as pd

radiology_scan = pd.DataFrame({"patient_id":['a001','a002','a003'],
                     "Visit_year": [2010, 2012, 2016], "scan_report": [1, 55,2]})



OPD_data = pd.DataFrame({"patient_id":['a001','a001','a002','a003','a004','a005'],
                      
                      "Visit_year": [2010 , 2010 , 2013, 2017, 2017,2018],
                      
                      "doctor_comments": ['diagnosis normal, xyz symptoms',
                                          'diagnosis normal, abc symptoms',
                                          'diagnosis abnormal, pqr symptoms',
                                          'diagnosis abnormal, apq symptoms',
                                          'diagnosis normal, xzy symptoms',
                                          'diagnosis abnormal, yzx symptoms' ]})


x = pd.merge_asof(radiology_scan, OPD_data, on =  ["Visit_year"], by =   ["patient_id"] , direction="nearest")

# when using merge_asof only the first visit of the a001's OPD is seen in the merged data frame


y = pd.merge(radiology_scan, OPD_data, on =["Visit_year","patient_id" ], how = 'left' )

# merge function gives both visits of the matched visit year , but i cannot merge the nearest visits of other patients whose
# visit years from both the data frames don't match, hence I want to use merge_asof

When using the merge_asof function I am unable to retain the value of the 2nd visit of the patient a001 from the OPD-data in the merged data frame, even though the id and the visit year matches.使用 merge_asof function 时,即使 id 和就诊年份匹配,我也无法从合并数据框中的 OPD 数据中保留患者 a001 的第二次就诊值。 The doctor_comment is different and hence i want to retain it in the merged data frame.医生评论是不同的,因此我想将它保留在合并的数据框中。

On the other hand while using the merge function I am able to retain it but I cannot asof for other patient's data to match the nearest visit year.另一方面,在使用合并 function 时,我可以保留它,但我无法确保其他患者的数据与最近的就诊年份相匹配。

The output that I expect is我期望的 output 是在此处输入图像描述

Reversing the input order may produce what you want, as merge_asof is a "left join" operation:颠倒输入顺序可能会产生您想要的结果,因为merge_asof是“左连接”操作:

x = pd.merge_asof(OPD_data, radiology_scan, on=["Visit_year"], by=["patient_id"], direction="nearest")

output: output:

  patient_id  Visit_year                   doctor_comments  scan_report
0       a001        2010    diagnosis normal, xyz symptoms          1.0
1       a001        2010    diagnosis normal, abc symptoms          1.0
2       a002        2013  diagnosis abnormal, pqr symptoms         55.0
3       a003        2017  diagnosis abnormal, apq symptoms          2.0
4       a004        2017    diagnosis normal, xzy symptoms          NaN
5       a005        2018  diagnosis abnormal, yzx symptoms          NaN

Note that Visit_year is taken from OPD_data in this case.请注意,在这种情况下, Visit_year取自OPD_data

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM