[英]Python: find most recent date in one column with no matching date in another
我有两个日期列代表客户设施的入口和出口。
ID | entry_date | 退出日期 | original_entrydate |
---|---|---|---|
003246 | 2022-03-22 | 钠 | 2012-10-01 |
003246 | 2015-07-24 | 2022-03-22 | 2012-10-01 |
003246 | 2012-10-01 | 2015-07-24 | 2012-10-01 |
003246 | 2001-02-02 | 2010-04-05 | 2001-02-02 |
对于表中 ID 的所有实例,我需要将 entry_date 与 exit_date 相匹配,以找到最近的进入日期,该日期表示该 ID 在设施之间移动但未离开护理的不间断时间跨度的开始,并将其返回在列中,original_entrydate。
在该示例中,前三行的 original_entrydate 的值为 2012-10-01,因为该 entry_date 与 exit_date 不匹配,表明脱离护理,日期显示持续了两年零几个月。 如果该 ID 有其他记录,则该过程将重置并查找从护理分离之前的任何记录的 original_entrydate,直到下一次分离。
我以可以想象的最笨拙的方式解决了我的问题——通过创建嵌套的 if-else 语句:
res_phys_levels['ORIGINAL_AdmDt']=''
for i in range(0, len(res_phys_levels)):
start_ID = res_phys_levels.iloc[i]['Individual_ID']
aDate = res_phys_levels.iloc[i]['Admit_Date']
id_count = res_phys_levels.Individual_ID.value_counts()[start_ID]
if id_count == 1: #if there's only one instance of Individual ID in table, then ORIGINAL_AdmDt = Admit_Date
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else: #if there's more than one instance of Individual_ID, then--
j = i+1
next_ID = res_phys_levels.iloc[j]['Individual_ID']
if start_ID != next_ID:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else:
sDate = res_phys_levels.iloc[j]['SEPARATION_DATE']
if aDate != sDate:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else:
aDate = res_phys_levels.iloc[j]['Admit_Date']
k = j+1
next_ID = res_phys_levels.iloc[k]['Individual_ID']
if start_ID != next_ID:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else:
sDate = res_phys_levels.iloc[k]['SEPARATION_DATE']
if aDate != sDate:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else:
aDate = res_phys_levels.iloc[k]['Admit_Date']
m = k+1
next_ID = res_phys_levels.iloc[m]['Individual_ID']
if start_ID != next_ID:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else:
sDate = res_phys_levels.iloc[m]['SEPARATION_DATE']
if aDate != sDate:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else:
aDate = res_phys_levels.iloc[k]['Admit_Date']
n = m+1
next_ID = res_phys_levels.iloc[n]['Individual_ID']
if start_ID != next_ID:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
这有两个原因:数据帧按“Individual_ID”升序和“Admit_Date”降序排序——嵌套的 if-else 语句允许将 index[i] 处的“Individual_ID”与后续行进行比较,直到所有可能性都筋疲力尽的。
我还知道每个 ID 最多有 4 行。
但是——请告诉我一个更好、更 Pythonic 的方法!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.