簡體   English   中英

Pandas Dataframe 按條件放置行

[英]Pandas Dataframe Drop Lines by Condition

我創建了一些數據:

import pandas as pd
d = {'Time': ['01.10.2019, 09:56:52', '01.10.2019, 09:57:15', '02.10.2019 09:57:23', '02.10.2019 10:02:58', '02.10.2019 13:11:58', '02.10.2019 13:22:55']
     ,'Action': ['Opened', 'Closed', 'Opened', 'Closed', 'Opened', 'Closed']
     ,'Name': ['CTO', 'CTO', 'CFO', 'CFO', 'CFO' , 'CFO']}
df = pd.DataFrame(data=d)

    Time                    Action  Name
0   01.10.2019, 09:56:52    Opened  CTO
1   01.10.2019, 09:57:15    Closed  CTO
2   02.10.2019, 09:57:23    Opened  CFO
3   02.10.2019, 10:02:58    Closed  CFO
4   02.10.2019, 13:11:58    Opened  CFO
5   02.10.2019, 13:22:55    Closed  CFO

現在我想在時間 < 5 分鍾時刪除帶有條件的行,如果有多行同名,它應該刪除第一個“打開”操作和最后一個“關閉”之間的行,所以每次都是首先作為操作打開,如果有相同的名稱,則關閉。 我試過

mask = df.drop(df[pd.to_datetime(df["Time"]).diff().dt.seconds.gt(300)].index)

但這僅顯示了前三行。 我怎么能那樣做?

我的輸出應該是這樣的:

    Time                    Action  Name
0   02.10.2019, 09:57:23    Opened  CFO
1   02.10.2019, 13:22:55    Closed  CFO

因為前兩行不到 5 分鍾,而第三行和第四行與之前的名稱相同。 但如果日期是一天后,它應該是這樣的:

    Time                    Action  Name
2   02.10.2019, 09:57:23    Opened  CFO
3   02.10.2019, 10:02:58    Closed  CFO
4   03.10.2019, 13:11:58    Opened  CFO
5   03.10.2019, 13:22:55    Closed  CFO

也許不是世界上最干凈的方式,但它可以完成工作:

import pandas as pd

d = {'Time': ['01.10.2019, 09:56:52', '01.10.2019, 09:57:15', '02.10.2019 09:57:23', '02.10.2019 10:02:58',
              '02.10.2019 13:11:58', '02.10.2019 13:22:55', '03.10.2019 14:20:44', '03.10.2019 14:30:44']
    , 'Action': ['Opened', 'Closed', 'Opened', 'Closed', 'Opened', 'Closed', 'Opened', 'Closed']
    , 'Name': ['CTO', 'CTO', 'CFO', 'CFO', 'CFO', 'CFO', 'CFO', 'CFO']}
df = pd.DataFrame(data=d)
df['Time'] = pd.to_datetime(df['Time'])
df.insert(1, 'Date', df['Time'].apply(lambda x: x.date()))

out = pd.DataFrame()
for name, group in df.groupby(['Name', 'Date']):
    first_open_idx = group[group['Action'] == 'Opened']['Time'].first_valid_index()
    last_close_idx = group[group['Action'] == 'Closed']['Time'].last_valid_index()

    if first_open_idx is not None and last_close_idx is not None:
        time_diff = group.loc[last_close_idx]['Time'] - group.loc[first_open_idx]['Time']
        if time_diff.seconds > 300:
            out = out.append(group[group.index.isin([first_open_idx, last_close_idx])])

print(out)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM