繁体   English   中英

如何根据复杂条件删除特定的 pandas dataframe 行

[英]How to drop specific pandas dataframe rows based on complex conditions

我有这个 pandas dataframe 叫 df,

所以,

df

    time                       entry

0   2022-07-28 13:35:00         True
1   2022-07-29 14:15:00         True
Name: time, dtype: datetime64[ns] 

df 中的“条目”始终为真

生成它的示例代码:

import pandas as pd

tbl = {"time" :["2022-07-28 13:35:00", "2022-07-29 14:15:00"],
      "entry" : [True, True]}

df = pd.DataFrame(tbl)


df.sort_values(by = "time", inplace=True)

我还有另一个 dataframe 从 df 时间开始,但它有更多的日期,我们称之为 df2:

df2 

    time                      entry      target_long      stop_long

0   2022-07-28 13:35:00       True          NaN             NaN
1   2022-07-28 13:35:15       True          NaN             NaN
2   2022-07-28 13:35:30       NaN           NaN             True
3   2022-07-28 13:35:45       True          NaN             NaN
.          . 
.          .
n    2022-07-29 14:15:00      True          NaN             NaN
n+1  2022-07-29 14:15:15      True          NaN             NaN
n+2  2022-07-29 14:15:30      True          NaN             NaN
n+3  2022-07-29 14:15:45      NaN           True            NaN
n+4  2022-07-29 14:16:00      True          NaN             NaN
n+5  2022-07-29 14:16:15      NaN           True            NaN

生成它的示例代码:

tbl2 = {"time" :["2022-07-28 13:35:00", "2022-07-28 13:35:15", "2022-07-28 13:35:30",
                "2022-07-28 13:35:45", "2022-07-29 14:15:00","2022-07-29 14:15:15",
                "2022-07-29 14:15:30", "2022-07-29 14:15:45", "2022-07-29 14:16:00", "2022-07-29 14:16:15"],
        "entry" : [True, True, "NaN", True, True, True, True, "NaN", True, "NaN"],
       "target_long" : ["NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", True, "NaN", True],
        "stop_long" : ["NaN", "NaN", True, "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN"]}

df2 = pd.DataFrame(tbl2)
df2.sort_values(by = "date", inplace=True)

我需要当 df2 中的“条目”为 NaN 并且如果((“stop_long”为 True)或(target_long 为 True)),则删除 df2 的所有其他行,但如果 df2 的时间在 df 内,则不要T 下降,但开始做和以前一样的事情。

结果将是一个 dataframe,如下所示:

df3

    time                      entry      target_long      stop_long

0   2022-07-28 13:35:00       True          NaN             NaN
1   2022-07-28 13:35:30       NaN           NaN             True

2   2022-07-29 14:15:00       True          NaN              NaN
3   2022-07-29 14:15:45       NaN           True             NaN

有任何想法吗?

编辑:我尝试了两个答案的解决方案,但有一个案例没有考虑,我更新了示例代码

df.time = pd.to_datetime(df.time)
df2.time = pd.to_datetime(df2.time)

df = df.set_index('time')
df2 = df2.set_index('time')

df = df.replace('NaN', False).astype(bool)
df2 = df2.replace('NaN', False).astype(bool)

df3 = (df2.groupby(df2.index.date)
          .apply(lambda x: x[~x.entry & (x.target_long | x.stop_long) | x.index.isin(df.index)]
                [lambda y: y[(y.index <= y.target_long.idxmax()) | (y.index <= y.stop_long.idxmax())]])
          .droplevel(-2)
          .dropna(how='all')
          .reset_index())
print(df3)

Output:

                 time entry target_long stop_long
0 2022-07-28 13:35:00  True         NaN       NaN
1 2022-07-28 13:35:30   NaN         NaN      True
2 2022-07-29 14:15:00  True         NaN       NaN
3 2022-07-29 14:15:45   NaN        True       NaN

请:

df3 = df2[df2["time"].isin(df["time"]) | ((df2['entry'] == "NaN") & ((df2['stop_long'] == True) | (df2['target_long'] == True)))]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM