简体   繁体   English

筛选小于另一个日期的最大日期

[英]Filter on max date less than another date

I have a dateframe with people and jobs where the unit associated with the job can change throughout the year.我有一个关于人员和工作的日期框架,其中与工作相关的单位可以全年变化。 How do I filter on the maximum unit date that is less than the report date column for that person and position?如何筛选小于该人员和职位的报告日期列的最大单位日期?

My data frame looks like this:我的数据框如下所示:

person_id   report_date     position_no     unit_date   unit
1           10/1/2017       123456          9/1/2017    789
1           10/1/2017       123456          9/10/2017   657
2           10/1/2017       251566          8/1/2017    123
2           10/1/2017       251566          8/1/2016    123
1           10/1/2018       123456          1/1/2018    541
1           10/1/2018       123456          2/1/2018    365
2           10/1/2018       251566          12/1/2017   155
2           10/1/2018       251566          3/1/2018    355

Here's my desired output:这是我想要的输出:

person_id   report_date     position_no     unit_date   unit
1           10/1/2017       123456          9/10/2017   657
2           10/1/2017       251566          8/1/2017    123
1           10/1/2018       123456          2/1/2018    365
2           10/1/2018       251566          3/1/2018    355

I'm new to using lambda with filter and I had hoped that something like this would work but it doesn't:我是使用 lambda 过滤器的新手,我曾希望这样的事情会起作用,但它没有:

df.groupby(['person_id','report_date','position_no']).filter(lambda x: x['unit_date'].max() < x['report_date'])

Setup设置

df.report_date = pd.to_datetime(df.report_date)
df.unit_date = pd.to_datetime(df.unit_date)

Better to not use lambdas with a filter, you can instead use basic comparison operators here:最好不要将 lambdas 与过滤器一起使用,您可以在此处使用基本的比较运算符:

m = df['unit_date'] < df['report_date']
u = df.loc[m].groupby(['person_id', 'position_no', 'report_date'])['unit_date'].idxmax()

df.loc[u]

   person_id report_date  position_no  unit_date  unit
1          1  2017-10-01       123456 2017-09-10   657
5          1  2018-10-01       123456 2018-02-01   365
2          2  2017-10-01       251566 2017-08-01   123
7          2  2018-10-01       251566 2018-03-01   355

If you want the order to match the original DataFrame:如果您希望订单与原始 DataFrame 匹配:

df.loc[u.sort_values()]

   person_id report_date  position_no  unit_date  unit
1          1  2017-10-01       123456 2017-09-10   657
2          2  2017-10-01       251566 2017-08-01   123
5          1  2018-10-01       123456 2018-02-01   365
7          2  2018-10-01       251566 2018-03-01   355

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM