简体   繁体   English

检查一个条件中另一个数据框中是否存在一个数据框中的值

[英]Check if a vaue in a dataframe exists in another dataframe with a condition

I have a pandas dataframe with a structure similar to: 我有一个熊猫数据框,其结构类似于:

Application | Account  |  Application_Date
1           | 444444   |  10/01/2018
2           | 444444   |  09/01/2018
3           | 555555   |  10/01/2018

And a different dataframe with a structure like this: 另一个具有如下结构的数据框:

Case     | Account | Case_Date
1        | 444444  | 09/01/2018
2        | 444444  | 11/01/2018
3        | 444444  | 10/01/2018
4        | 555555  | 07/01/2018

I want to check if the Account in the first dataframe exists in the second dataframe only if the Case_date is greater than or equal to the Application_Date, and get the output in a column in the first dataframe, as well as the cases numbers, like: 我想仅在Case_date大于或等于Application_Date的情况下检查第一个数据帧中的Account是否存在于第二个数据帧中,并在第一个数据帧的列中获取输出,以及用例编号,例如:

Application | Account  |  Application_Date | Case_Exists | Case_Number
1           | 444444   |  10/01/2018       |  Y          |  2, 3
2           | 444444   |  09/01/2018       |  Y          |  1, 2, 3
3           | 555555   |  10/01/2018       |  N          |

Could you please advise? 您能否提一些建议?

Thank you! 谢谢!

It's a bit of a convoluted solution, but it gets you there: 这是一个令人费解的解决方案,但可以帮助您:

  1. set the dates to proper datetime 将日期设置为正确的日期时间
  2. First do a merge between your 2 dataframes 首先在两个数据框之间进行合并
  3. Locate where case dates are greater than or equal to application dates, groupby on Application and Account , and get unique cases 找到申请日期大于或等于申请日期的地方,在“ Application和“ Account上分组,并获取唯一的个案
  4. merge the result of that back into your first df 将结果合并回您的第一个df
  5. Assign Y to the non-null values (where cases were found): Y分配给非null值(找到个案):

Setup: 设定:

>>> df1
   Application  Account Application_Date
0            1   444444       10/01/2018
1            2   444444       09/01/2018
2            3   555555       10/01/2018
>>> df2
   Case  Account   Case_Date
0     1   444444  09/01/2018
1     2   444444  11/01/2018
2     3   444444  10/01/2018
3     4   555555  07/01/2018

Process: 处理:

# set to datetime
df1['Application_Date'] = pd.to_datetime(df1['Application_Date'])

df2['Case_Date'] = pd.to_datetime(df2['Case_Date'])

# first merge
merged = df2.merge(df1)

# loc and groupby
cases = (merged.loc[merged['Case_Date'] >= merged['Application_Date']]
         .groupby(['Account','Application'])['Case']
         .unique())

# merge back
final = (cases.to_frame('Case_Number').merge(df1,left_index=True,
                                right_on=['Account', 'Application'],
                                how='outer')
         # Following line is just to re-adjust column order
         [['Application','Account','Application_Date','Case_Number']])

# assign Y and N
final['Case_Exists'] = final.Case_Number.notnull().map({True:'Y',False:'N'})

>>> final
   Application  Account Application_Date Case_Number Case_Exists
0            1   444444       2018-10-01      [2, 3]           Y
1            2   444444       2018-09-01   [1, 2, 3]           Y
2            3   555555       2018-10-01         NaN           N

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 检查一个 dataframe 是否存在于另一个中 - check if one dataframe exists in another 检查一个 dataframe 中的列对是否存在于另一个中? - Check if column pair in one dataframe exists in another? 如何检查值是否存在于另一个熊猫数据框中? - How to check if value exists in another dataframe in pandas? 检查数据框中的 ID 是否存在于另一个数据框中的最快方法 - Fastest way to check if an ID in your dataframe exists in another dataframe 检查一个数据框中的值是否存在于另一个数据框中 - Check if value from one dataframe exists in another dataframe 检查另一个数据框列中是否存在数据框列中的少数值 - To check if few values in dataframe column exists in another dataframe column 检查 dataframe 中的行是否存在于另一个 dataframe 中并从两者中删除 - Check if row in dataframe exists in another dataframe and remove from both 检查一个数据框中的值是否存在于另一个数据框中并创建列 - Check if value from one dataframe exists in another dataframe and create column 检查 dataframe 中的行值是否存在于另一个 dataframe 中,使用循环进行协调 - Check if row value in a dataframe exists in another dataframe using loop for reconciliation 检查来自一个 dataframe 的文本是否存在于另一个 dataframe Python - Check if text from one dataframe exists in another dataframe Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM