[英]How to drop pandas dataframe rows based on conditions that consider other dataframe
[英]How to drop specific pandas dataframe rows based on complex conditions
我有這個 pandas dataframe 叫 df,
所以,
df
time entry
0 2022-07-28 13:35:00 True
1 2022-07-29 14:15:00 True
Name: time, dtype: datetime64[ns]
df 中的“條目”始終為真
生成它的示例代碼:
import pandas as pd
tbl = {"time" :["2022-07-28 13:35:00", "2022-07-29 14:15:00"],
"entry" : [True, True]}
df = pd.DataFrame(tbl)
df.sort_values(by = "time", inplace=True)
我還有另一個 dataframe 從 df 時間開始,但它有更多的日期,我們稱之為 df2:
df2
time entry target_long stop_long
0 2022-07-28 13:35:00 True NaN NaN
1 2022-07-28 13:35:15 True NaN NaN
2 2022-07-28 13:35:30 NaN NaN True
3 2022-07-28 13:35:45 True NaN NaN
. .
. .
n 2022-07-29 14:15:00 True NaN NaN
n+1 2022-07-29 14:15:15 True NaN NaN
n+2 2022-07-29 14:15:30 True NaN NaN
n+3 2022-07-29 14:15:45 NaN True NaN
n+4 2022-07-29 14:16:00 True NaN NaN
n+5 2022-07-29 14:16:15 NaN True NaN
生成它的示例代碼:
tbl2 = {"time" :["2022-07-28 13:35:00", "2022-07-28 13:35:15", "2022-07-28 13:35:30",
"2022-07-28 13:35:45", "2022-07-29 14:15:00","2022-07-29 14:15:15",
"2022-07-29 14:15:30", "2022-07-29 14:15:45", "2022-07-29 14:16:00", "2022-07-29 14:16:15"],
"entry" : [True, True, "NaN", True, True, True, True, "NaN", True, "NaN"],
"target_long" : ["NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", True, "NaN", True],
"stop_long" : ["NaN", "NaN", True, "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN"]}
df2 = pd.DataFrame(tbl2)
df2.sort_values(by = "date", inplace=True)
我需要當 df2 中的“條目”為 NaN 並且如果((“stop_long”為 True)或(target_long 為 True)),則刪除 df2 的所有其他行,但如果 df2 的時間在 df 內,則不要T 下降,但開始做和以前一樣的事情。
結果將是一個 dataframe,如下所示:
df3
time entry target_long stop_long
0 2022-07-28 13:35:00 True NaN NaN
1 2022-07-28 13:35:30 NaN NaN True
2 2022-07-29 14:15:00 True NaN NaN
3 2022-07-29 14:15:45 NaN True NaN
有任何想法嗎?
編輯:我嘗試了兩個答案的解決方案,但有一個案例沒有考慮,我更新了示例代碼
df.time = pd.to_datetime(df.time)
df2.time = pd.to_datetime(df2.time)
df = df.set_index('time')
df2 = df2.set_index('time')
df = df.replace('NaN', False).astype(bool)
df2 = df2.replace('NaN', False).astype(bool)
df3 = (df2.groupby(df2.index.date)
.apply(lambda x: x[~x.entry & (x.target_long | x.stop_long) | x.index.isin(df.index)]
[lambda y: y[(y.index <= y.target_long.idxmax()) | (y.index <= y.stop_long.idxmax())]])
.droplevel(-2)
.dropna(how='all')
.reset_index())
print(df3)
Output:
time entry target_long stop_long
0 2022-07-28 13:35:00 True NaN NaN
1 2022-07-28 13:35:30 NaN NaN True
2 2022-07-29 14:15:00 True NaN NaN
3 2022-07-29 14:15:45 NaN True NaN
請:
df3 = df2[df2["time"].isin(df["time"]) | ((df2['entry'] == "NaN") & ((df2['stop_long'] == True) | (df2['target_long'] == True)))]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.