简体   繁体   English

根据pandas中的日期和标志条件去掉dataframe的行

[英]Remove the rows of dataframe based on date and flag condition in pandas

I have a dataframe我有一个 dataframe

df = pd.DataFrame([["A","13-02-2022","B","FALSE"],["A","13-02-2022","C","FALSE"],["A","14-02-2022","D","FALSE"],
                   ["A","14-02-2022","E","FALSE"],["A","16-02-2022","A","TRUE"],["A","16-02-2022","F","FALSE"],
                   ["A","17-02-2022","G","FALSE"],["A","17-02-2022","H","FALSE"],["A","18-02-2022","I","FALSE"],
                   ["A","18-02-2022","J","FALSE"]],columns=["id1","date","id2","flag"])
id1   date     id2  flag
A   13-02-2022  B   FALSE
A   13-02-2022  C   FALSE
A   14-02-2022  D   FALSE
A   14-02-2022  E   FALSE
A   16-02-2022  A   TRUE
A   16-02-2022  F   FALSE
A   17-02-2022  G   FALSE
A   17-02-2022  H   FALSE
A   18-02-2022  I   FALSE
A   18-02-2022  J   FALSE

I want to remove all the rows of previous working day, next working day and the day where flag is TRUE.我想删除前一个工作日、下一个工作日和标志为 TRUE 的那一天的所有行。

For example here 16th Feb flag is TRUE, so remove all the rows of previous working day 14th Feb, next working day 17th Feb and 16th Feb. If TRUE is in last day of month 28th Feb where next working day is not there, then remove the rows of TRUE flag day and previous working day only.例如,这里的 2 月 16 日标志为 TRUE,因此删除前一个工作日 2 月 14 日、下一个工作日 2 月 17 日和 2 月 16 日的所有行。如果 TRUE 在 2 月 28 日的最后一天,而下一个工作日不存在,则删除仅 TRUE 卖旗日和前一个工作日的行。

Expected Output:预计 Output:

df_out = pd.DataFrame([["A","13-02-2022","B","FALSE"],["A","13-02-2022","C","FALSE"],["A","18-02-2022","I","FALSE"],
                       ["A","18-02-2022","J","FALSE"]],columns=["id1","date","id2","flag"])
id1   date     id2  flag
A   13-02-2022  B   FALSE
A   13-02-2022  C   FALSE
A   18-02-2022  I   FALSE
A   18-02-2022  J   FALSE

How to do it?怎么做?

You can use boolean indexing:您可以使用 boolean 索引:

# ensure boolean and datetime
df['flag'] = df['flag'].eq('TRUE')
df['date'] = pd.to_datetime(df['date'], dayfirst=True)

bday = pd.offsets.BusinessDay(1)

drop = pd.concat([dates+bday, dates-bday])

out = df[~(df['date'].isin(drop) | df['flag'])]

Output: Output:

  id1       date id2   flag
0   A 2022-02-13   B  False
1   A 2022-02-13   C  False
2   A 2022-02-14   D  False
3   A 2022-02-14   E  False
5   A 2022-02-16   F  False
8   A 2022-02-18   I  False
9   A 2022-02-18   J  False

You can try to create a filter data frame and select everything which is not in it:您可以尝试创建一个过滤器数据框和 select 中不包含的所有内容:

df['date'] = pd.to_datetime(df['date'], format="%d-%m-%Y")

dates = df[df.flag == 'TRUE']['date']
to_drop = pd.concat([dates, dates + pd.offsets.BusinessDay(1), dates - pd.offsets.BusinessDay(1)])
df_out = df[~df['date'].isin(to_drop)]
df_out

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM