[英]Pandas. delete rows between two dates groupby person
I have two dataframes;我有两个数据框; available_df and delete_df.
可用_df 和删除_df。
<available_df>
Person start_day end_day available
1 2012-07-13 2012-07-27 0
1 2012-07-20 2012-08-03 0
1 2012-07-27 2012-08-10 0
2 2012-05-06 2012-05-20 0
2 2012-05-13 2012-05-27 0
2 2012-06-20 2012-07-03 0
2 2012-06-27 2012-07-10 0
2 2012-07-04 2012-07-11 0
<delete_df>
Person start_day end_day
1 2012-05-18 2012-05-24
1 2012-07-13 2012-07-20
2 2012-05-18 2012-06-23
<wanted_results>
Person start_day end_day available
1 2012-07-27 2012-08-10 0
2 2012-06-27 2012-07-10 0
2 2012-07-04 2012-07-11 0
What I want to do is, among available_df, I want to groupby person's records and remain rows which are not included in delete_df periods .我想要做的是,在 available_df 中,我想对人员的记录进行分组并保留不包含在 delete_df period 中的行。 If a person's start_day and end_day from delete_df is included in available_df, then delete.
如果 delete_df 中某人的 start_day 和 end_day 包含在 available_df 中,则删除。
I tried to use 'enumerate' methods but I failed it.. Is there anyone who can help me?我尝试使用“枚举”方法,但失败了。有没有人可以帮助我?
Thank you.谢谢你。
I would use pandas.merge_asof
here:我会在这里使用
pandas.merge_asof
:
out= (pd
.merge_asof(available_df.sort_values('end_day'),
delete_df.sort_values('start_day'),
by='Person', left_on='end_day', right_on='start_day',
suffixes=(None, '_'),
)
.loc[lambda d: d['start_day'].gt(d['end_day_'])]
.drop(columns=['start_day_', 'end_day_'])
)
output: output:
Person start_day end_day available
3 2 2012-06-27 2012-07-10 0
4 2 2012-07-04 2012-07-11 0
7 1 2012-07-27 2012-08-10 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.