简体   繁体   English

Python:在一列中查找最近的日期,而另一列中没有匹配的日期

[英]Python: find most recent date in one column with no matching date in another

I have two date columns representing Entries and Exits from facilities for clients.我有两个日期列代表客户设施的入口和出口。

ID ID entry_date entry_date exit_date退出日期 original_entrydate original_entrydate
003246 003246 2022-03-22 2022-03-22 NaN 2012-10-01 2012-10-01
003246 003246 2015-07-24 2015-07-24 2022-03-22 2022-03-22 2012-10-01 2012-10-01
003246 003246 2012-10-01 2012-10-01 2015-07-24 2015-07-24 2012-10-01 2012-10-01
003246 003246 2001-02-02 2001-02-02 2010-04-05 2010-04-05 2001-02-02 2001-02-02

For all instances of an ID in the table, I need to match entry_date to exit_date to find the most recent entry date that represents the beginning of an uninterrupted span of time in which that ID was moving between facilities but not leaving care, and return it in a column, original_entrydate.对于表中 ID 的所有实例,我需要将 entry_date 与 exit_date 相匹配,以找到最近的进入日期,该日期表示该 ID 在设施之间移动但未离开护理的不间断时间跨度的开始,并将其返回在列中,original_entrydate。

In the example, the value for original_entrydate for the first three rows would be 2012-10-01, because that entry_date does not match an exit_date, indicating a separation from care, which the dates show lasted for two years and some months.在该示例中,前三行的 original_entrydate 的值为 2012-10-01,因为该 entry_date 与 exit_date 不匹配,表明脱离护理,日期显示持续了两年零几个月。 If there were additional records for that ID, that process would reset and find the original_entrydate for any records preceding that separation from care, up to the next separation.如果该 ID 有其他记录,则该过程将重置并查找从护理分离之前的任何记录的 original_entrydate,直到下一次分离。

I solved my problem in the most clunky way imaginable--by creating nested if-else statements:我以可以想象的最笨拙的方式解决了我的问题——通过创建嵌套的 if-else 语句:

res_phys_levels['ORIGINAL_AdmDt']=''

for i in range(0, len(res_phys_levels)):
    start_ID = res_phys_levels.iloc[i]['Individual_ID']
    aDate = res_phys_levels.iloc[i]['Admit_Date']
    id_count = res_phys_levels.Individual_ID.value_counts()[start_ID]
    if id_count == 1:     #if there's only one instance of Individual ID in table, then ORIGINAL_AdmDt = Admit_Date
        res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
    else:                 #if there's more than one instance of Individual_ID, then--
        j = i+1        
        next_ID = res_phys_levels.iloc[j]['Individual_ID']
        if start_ID != next_ID:
            res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
        else: 
            sDate = res_phys_levels.iloc[j]['SEPARATION_DATE']
            if aDate != sDate: 
                res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
            else: 
                aDate = res_phys_levels.iloc[j]['Admit_Date']
                k = j+1
                next_ID = res_phys_levels.iloc[k]['Individual_ID']
                if start_ID != next_ID:
                    res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                else: 
                    sDate = res_phys_levels.iloc[k]['SEPARATION_DATE']
                    if aDate != sDate: 
                        res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                    else: 
                        aDate = res_phys_levels.iloc[k]['Admit_Date']
                        m = k+1
                        next_ID = res_phys_levels.iloc[m]['Individual_ID']
                        if start_ID != next_ID:
                            res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                        else:
                            sDate = res_phys_levels.iloc[m]['SEPARATION_DATE']
                            if aDate != sDate:
                                res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                            else:
                                aDate = res_phys_levels.iloc[k]['Admit_Date']
                                n = m+1
                                next_ID = res_phys_levels.iloc[n]['Individual_ID']
                                if start_ID != next_ID:
                                    res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate

This worked for two reasons: the dataframe was sorted by 'Individual_ID' ascending and 'Admit_Date' descending--and the nested if-else statements allowed for comparison of the 'Individual_ID' at index[i] with subsequent rows, until all possibilities were exhausted.这有两个原因:数据帧按“Individual_ID”升序和“Admit_Date”降序排序——嵌套的 if-else 语句允许将 index[i] 处的“Individual_ID”与后续行进行比较,直到所有可能性都筋疲力尽的。

I also knew there was a maximum of 4 rows per ID.我还知道每个 ID 最多有 4 行。

BUT--please show me a better, more pythonic way of doing this!但是——请告诉我一个更好、更 Pythonic 的方法!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM