简体   繁体   English

如何根据每个 dataframe 中两个不同日期列的日期条件合并两个数据框?

[英]How to merge two dataframes based on a date condition from two different date columns in each dataframe?

I have two Dataframes in the form:我有两个数据框的形式:

Dataframe(df1):数据框(df1):

P_CLIENT_ID P_CLIENT_ID P_DATE_ENCOUNTER P_DATE_ENCOUNTER
25835 25835 2016-12-21 2016-12-21
25835 25835 2017-02-21 2017-02-21
25835 25835 2017-04-25 2017-04-25
25835 25835 2017-06-21 2017-06-21
25835 25835 2017-09-04 2017-09-04
25835 25835 2018-01-08 2018-01-08
25835 25835 2018-04-03 2018-04-03

Dataframe(df2):数据框(df2):

R_CLIENT_ID R_CLIENT_ID R_DATE_TESTED R_DATE_TESTED R_RESULT R_RESULT
25835 25835 2017-03-07 2017-03-07 20.0 20.0
25835 25835 2017-08-03 2017-08-03 20.0 20.0
25835 25835 2018-03-23 2018-03-23 20.0 20.0
25835 25835 2019-06-28 2019-06-28 20.0 20.0
25835 25835 2019-08-19 2019-08-19 42.0 42.0
25835 25835 2020-04-20 2020-04-20 40.0 40.0
25835 25835 2021-06-03 2021-06-03 20.0 20.0

I want to merge df2 onto df1 (main table) with the join key being P_CLIENT_ID and R_CLIENT_ID appending the most recent R_DATE_TESTED and R_RESULT我想将 df2 合并到 df1(主表)上,连接键为P_CLIENT_IDR_CLIENT_ID ,附加最新的R_DATE_TESTEDR_RESULT

First Condition: If R_DATE_TESTED > P_DATE_ENCOUNTER then nullify the R_DATE_TESTED, R_RESULT fields.第一个条件:如果R_DATE_TESTED > P_DATE_ENCOUNTER则使R_DATE_TESTED, R_RESULT字段无效。

Second Condition: If R_DATE_TESTED < P_DATE_ENCOUNTER then appned the most recent R_DATE_TESTED, R_RESULT fields to the dataframe with the final results as:第二个条件:如果R_DATE_TESTED < P_DATE_ENCOUNTER然后将最近的R_DATE_TESTED, R_RESULT字段附加到 dataframe,最终结果为:

The result from logic should be as below:逻辑结果应如下所示:

P_CLIENT_ID P_CLIENT_ID R_CLIENT_ID R_CLIENT_ID P_DATE_ENCOUNTER P_DATE_ENCOUNTER R_DATE_TESTED R_DATE_TESTED R_RESULT R_RESULT
25835 25835 25835.0 25835.0 2016-12-21 2016-12-21 NaN钠盐 NaN钠盐
25835 25835 25835.0 25835.0 2017-02-21 2017-02-21 NaN钠盐 NaN钠盐
25835 25835 25835.0 25835.0 2017-04-25 2017-04-25 2017-03-07 2017-03-07 20.0 20.0
25835 25835 25835.0 25835.0 2017-06-21 2017-06-21 2017-03-07 2017-03-07 20.0 20.0
25835 25835 25835.0 25835.0 2017-09-04 2017-09-04 2017-08-03 2017-08-03 20.0 20.0
25835 25835 25835.0 25835.0 2018-01-08 2018-01-08 2017-08-03 2017-08-03 20.0 20.0
25835 25835 25835.0 25835.0 2018-04-03 2018-04-03 2018-03-23 2018-03-23 20.0 20.0

Note: The actual dataset is quite large: df1 ~ 700000 rows and df2 ~ 125000 rows注意:实际数据集相当大:df1 ~ 700000 行和 df2 ~ 125000 行

Code attempt代码尝试

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'P_CLIENT_D': ['25835','25835','25835','25835','25835','25835','25835'],
                    'P_DATE_ENCOUNTER': ['2016-12-21','2017-02-21','2017-04-25','2017-06-21','2017-09-04','2018-01-08','2018-04-03']})

df2 = pd.DataFrame({'R_CLIENT_ID': ['25835','25835','25835','25835','25835','25835','25835'],
                    'R_DATE_TESTED': ['2017-03-07','2017-08-03','2018-03-23','2019-06-28','2019-08-19','2020-04-20','2021-06-03'],
                   'R_RESULT':[20,20,20,20,42,40,20]})

df_merged = pd.merge(df1, df2, left_on=['P_CLIENT_D'], right_on = ['R_CLIENT_ID'],  how='left')

df_merged = df_merged.drop_duplicates(subset=['P_CLIENT_D', 'P_DATE_ENCOUNTER'], keep='last')

df_merged['FLAG_LAB_AFTER_VISIT'] = 0
df_merged.loc[df_merged.R_DATE_TESTED >= df_merged.P_DATE_ENCOUNTER,'FLAG_LAB_AFTER_VISIT']=1
print(df_merged['FLAG_LAB_AFTER_VISIT'].sum(), 'future labs set to null')

#now the rows with flags - set all lab fields to null
df_merged.loc[df_merged['FLAG_LAB_AFTER_VISIT']==1, df2.columns] = np.nan

Try with pandas.merge_asof :尝试使用pandas.merge_asof

>>> pd.merge_asof(df1, 
                  df2, 
                  left_on="P_DATE_ENCOUNTER", 
                  right_on="R_DATE_TESTED", 
                  left_by="P_CLIENT_ID", 
                  right_by="R_CLIENT_ID")

   P_CLIENT_ID P_DATE_ENCOUNTER  R_CLIENT_ID R_DATE_TESTED  R_RESULT
0        25835       2016-12-21          NaN           NaT       NaN
1        25835       2017-02-21          NaN           NaT       NaN
2        25835       2017-04-25      25835.0    2017-03-07      20.0
3        25835       2017-06-21      25835.0    2017-03-07      20.0
4        25835       2017-09-04      25835.0    2017-08-03      20.0
5        25835       2018-01-08      25835.0    2017-08-03      20.0
6        25835       2018-04-03      25835.0    2018-03-23      20.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM