[英]Check if a vaue in a dataframe exists in another dataframe with a condition
I have a pandas dataframe with a structure similar to: 我有一个熊猫数据框,其结构类似于:
Application | Account | Application_Date
1 | 444444 | 10/01/2018
2 | 444444 | 09/01/2018
3 | 555555 | 10/01/2018
And a different dataframe with a structure like this: 另一个具有如下结构的数据框:
Case | Account | Case_Date
1 | 444444 | 09/01/2018
2 | 444444 | 11/01/2018
3 | 444444 | 10/01/2018
4 | 555555 | 07/01/2018
I want to check if the Account in the first dataframe exists in the second dataframe only if the Case_date is greater than or equal to the Application_Date, and get the output in a column in the first dataframe, as well as the cases numbers, like: 我想仅在Case_date大于或等于Application_Date的情况下检查第一个数据帧中的Account是否存在于第二个数据帧中,并在第一个数据帧的列中获取输出,以及用例编号,例如:
Application | Account | Application_Date | Case_Exists | Case_Number
1 | 444444 | 10/01/2018 | Y | 2, 3
2 | 444444 | 09/01/2018 | Y | 1, 2, 3
3 | 555555 | 10/01/2018 | N |
Could you please advise? 您能否提一些建议?
Thank you! 谢谢!
It's a bit of a convoluted solution, but it gets you there: 这是一个令人费解的解决方案,但可以帮助您:
Application
and Account
, and get unique cases Application
和“ Account
上分组,并获取唯一的个案 Y
to the non-null values (where cases were found): Y
分配给非null值(找到个案): >>> df1
Application Account Application_Date
0 1 444444 10/01/2018
1 2 444444 09/01/2018
2 3 555555 10/01/2018
>>> df2
Case Account Case_Date
0 1 444444 09/01/2018
1 2 444444 11/01/2018
2 3 444444 10/01/2018
3 4 555555 07/01/2018
# set to datetime
df1['Application_Date'] = pd.to_datetime(df1['Application_Date'])
df2['Case_Date'] = pd.to_datetime(df2['Case_Date'])
# first merge
merged = df2.merge(df1)
# loc and groupby
cases = (merged.loc[merged['Case_Date'] >= merged['Application_Date']]
.groupby(['Account','Application'])['Case']
.unique())
# merge back
final = (cases.to_frame('Case_Number').merge(df1,left_index=True,
right_on=['Account', 'Application'],
how='outer')
# Following line is just to re-adjust column order
[['Application','Account','Application_Date','Case_Number']])
# assign Y and N
final['Case_Exists'] = final.Case_Number.notnull().map({True:'Y',False:'N'})
>>> final
Application Account Application_Date Case_Number Case_Exists
0 1 444444 2018-10-01 [2, 3] Y
1 2 444444 2018-09-01 [1, 2, 3] Y
2 3 555555 2018-10-01 NaN N
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.