![](/img/trans.png)
[英]How to drop pandas dataframe rows based on conditions that consider other dataframe
[英]How to drop specific pandas dataframe rows based on complex conditions
我有这个 pandas dataframe 叫 df,
所以,
df
time entry
0 2022-07-28 13:35:00 True
1 2022-07-29 14:15:00 True
Name: time, dtype: datetime64[ns]
df 中的“条目”始终为真
生成它的示例代码:
import pandas as pd
tbl = {"time" :["2022-07-28 13:35:00", "2022-07-29 14:15:00"],
"entry" : [True, True]}
df = pd.DataFrame(tbl)
df.sort_values(by = "time", inplace=True)
我还有另一个 dataframe 从 df 时间开始,但它有更多的日期,我们称之为 df2:
df2
time entry target_long stop_long
0 2022-07-28 13:35:00 True NaN NaN
1 2022-07-28 13:35:15 True NaN NaN
2 2022-07-28 13:35:30 NaN NaN True
3 2022-07-28 13:35:45 True NaN NaN
. .
. .
n 2022-07-29 14:15:00 True NaN NaN
n+1 2022-07-29 14:15:15 True NaN NaN
n+2 2022-07-29 14:15:30 True NaN NaN
n+3 2022-07-29 14:15:45 NaN True NaN
n+4 2022-07-29 14:16:00 True NaN NaN
n+5 2022-07-29 14:16:15 NaN True NaN
生成它的示例代码:
tbl2 = {"time" :["2022-07-28 13:35:00", "2022-07-28 13:35:15", "2022-07-28 13:35:30",
"2022-07-28 13:35:45", "2022-07-29 14:15:00","2022-07-29 14:15:15",
"2022-07-29 14:15:30", "2022-07-29 14:15:45", "2022-07-29 14:16:00", "2022-07-29 14:16:15"],
"entry" : [True, True, "NaN", True, True, True, True, "NaN", True, "NaN"],
"target_long" : ["NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", True, "NaN", True],
"stop_long" : ["NaN", "NaN", True, "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN"]}
df2 = pd.DataFrame(tbl2)
df2.sort_values(by = "date", inplace=True)
我需要当 df2 中的“条目”为 NaN 并且如果((“stop_long”为 True)或(target_long 为 True)),则删除 df2 的所有其他行,但如果 df2 的时间在 df 内,则不要T 下降,但开始做和以前一样的事情。
结果将是一个 dataframe,如下所示:
df3
time entry target_long stop_long
0 2022-07-28 13:35:00 True NaN NaN
1 2022-07-28 13:35:30 NaN NaN True
2 2022-07-29 14:15:00 True NaN NaN
3 2022-07-29 14:15:45 NaN True NaN
有任何想法吗?
编辑:我尝试了两个答案的解决方案,但有一个案例没有考虑,我更新了示例代码
df.time = pd.to_datetime(df.time)
df2.time = pd.to_datetime(df2.time)
df = df.set_index('time')
df2 = df2.set_index('time')
df = df.replace('NaN', False).astype(bool)
df2 = df2.replace('NaN', False).astype(bool)
df3 = (df2.groupby(df2.index.date)
.apply(lambda x: x[~x.entry & (x.target_long | x.stop_long) | x.index.isin(df.index)]
[lambda y: y[(y.index <= y.target_long.idxmax()) | (y.index <= y.stop_long.idxmax())]])
.droplevel(-2)
.dropna(how='all')
.reset_index())
print(df3)
Output:
time entry target_long stop_long
0 2022-07-28 13:35:00 True NaN NaN
1 2022-07-28 13:35:30 NaN NaN True
2 2022-07-29 14:15:00 True NaN NaN
3 2022-07-29 14:15:45 NaN True NaN
请:
df3 = df2[df2["time"].isin(df["time"]) | ((df2['entry'] == "NaN") & ((df2['stop_long'] == True) | (df2['target_long'] == True)))]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.