简体   繁体   中英

Python: find most recent date in one column with no matching date in another

I have two date columns representing Entries and Exits from facilities for clients.

ID entry_date exit_date original_entrydate
003246 2022-03-22 NaN 2012-10-01
003246 2015-07-24 2022-03-22 2012-10-01
003246 2012-10-01 2015-07-24 2012-10-01
003246 2001-02-02 2010-04-05 2001-02-02

For all instances of an ID in the table, I need to match entry_date to exit_date to find the most recent entry date that represents the beginning of an uninterrupted span of time in which that ID was moving between facilities but not leaving care, and return it in a column, original_entrydate.

In the example, the value for original_entrydate for the first three rows would be 2012-10-01, because that entry_date does not match an exit_date, indicating a separation from care, which the dates show lasted for two years and some months. If there were additional records for that ID, that process would reset and find the original_entrydate for any records preceding that separation from care, up to the next separation.

I solved my problem in the most clunky way imaginable--by creating nested if-else statements:

res_phys_levels['ORIGINAL_AdmDt']=''

for i in range(0, len(res_phys_levels)):
    start_ID = res_phys_levels.iloc[i]['Individual_ID']
    aDate = res_phys_levels.iloc[i]['Admit_Date']
    id_count = res_phys_levels.Individual_ID.value_counts()[start_ID]
    if id_count == 1:     #if there's only one instance of Individual ID in table, then ORIGINAL_AdmDt = Admit_Date
        res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
    else:                 #if there's more than one instance of Individual_ID, then--
        j = i+1        
        next_ID = res_phys_levels.iloc[j]['Individual_ID']
        if start_ID != next_ID:
            res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
        else: 
            sDate = res_phys_levels.iloc[j]['SEPARATION_DATE']
            if aDate != sDate: 
                res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
            else: 
                aDate = res_phys_levels.iloc[j]['Admit_Date']
                k = j+1
                next_ID = res_phys_levels.iloc[k]['Individual_ID']
                if start_ID != next_ID:
                    res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                else: 
                    sDate = res_phys_levels.iloc[k]['SEPARATION_DATE']
                    if aDate != sDate: 
                        res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                    else: 
                        aDate = res_phys_levels.iloc[k]['Admit_Date']
                        m = k+1
                        next_ID = res_phys_levels.iloc[m]['Individual_ID']
                        if start_ID != next_ID:
                            res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                        else:
                            sDate = res_phys_levels.iloc[m]['SEPARATION_DATE']
                            if aDate != sDate:
                                res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                            else:
                                aDate = res_phys_levels.iloc[k]['Admit_Date']
                                n = m+1
                                next_ID = res_phys_levels.iloc[n]['Individual_ID']
                                if start_ID != next_ID:
                                    res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate

This worked for two reasons: the dataframe was sorted by 'Individual_ID' ascending and 'Admit_Date' descending--and the nested if-else statements allowed for comparison of the 'Individual_ID' at index[i] with subsequent rows, until all possibilities were exhausted.

I also knew there was a maximum of 4 rows per ID.

BUT--please show me a better, more pythonic way of doing this!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM