简体   繁体   中英

Check if a vaue in a dataframe exists in another dataframe with a condition

I have a pandas dataframe with a structure similar to:

Application | Account  |  Application_Date
1           | 444444   |  10/01/2018
2           | 444444   |  09/01/2018
3           | 555555   |  10/01/2018

And a different dataframe with a structure like this:

Case     | Account | Case_Date
1        | 444444  | 09/01/2018
2        | 444444  | 11/01/2018
3        | 444444  | 10/01/2018
4        | 555555  | 07/01/2018

I want to check if the Account in the first dataframe exists in the second dataframe only if the Case_date is greater than or equal to the Application_Date, and get the output in a column in the first dataframe, as well as the cases numbers, like:

Application | Account  |  Application_Date | Case_Exists | Case_Number
1           | 444444   |  10/01/2018       |  Y          |  2, 3
2           | 444444   |  09/01/2018       |  Y          |  1, 2, 3
3           | 555555   |  10/01/2018       |  N          |

Could you please advise?

Thank you!

It's a bit of a convoluted solution, but it gets you there:

  1. set the dates to proper datetime
  2. First do a merge between your 2 dataframes
  3. Locate where case dates are greater than or equal to application dates, groupby on Application and Account , and get unique cases
  4. merge the result of that back into your first df
  5. Assign Y to the non-null values (where cases were found):

Setup:

>>> df1
   Application  Account Application_Date
0            1   444444       10/01/2018
1            2   444444       09/01/2018
2            3   555555       10/01/2018
>>> df2
   Case  Account   Case_Date
0     1   444444  09/01/2018
1     2   444444  11/01/2018
2     3   444444  10/01/2018
3     4   555555  07/01/2018

Process:

# set to datetime
df1['Application_Date'] = pd.to_datetime(df1['Application_Date'])

df2['Case_Date'] = pd.to_datetime(df2['Case_Date'])

# first merge
merged = df2.merge(df1)

# loc and groupby
cases = (merged.loc[merged['Case_Date'] >= merged['Application_Date']]
         .groupby(['Account','Application'])['Case']
         .unique())

# merge back
final = (cases.to_frame('Case_Number').merge(df1,left_index=True,
                                right_on=['Account', 'Application'],
                                how='outer')
         # Following line is just to re-adjust column order
         [['Application','Account','Application_Date','Case_Number']])

# assign Y and N
final['Case_Exists'] = final.Case_Number.notnull().map({True:'Y',False:'N'})

>>> final
   Application  Account Application_Date Case_Number Case_Exists
0            1   444444       2018-10-01      [2, 3]           Y
1            2   444444       2018-09-01   [1, 2, 3]           Y
2            3   555555       2018-10-01         NaN           N

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM