[英]Pandas Dataframe Drop Lines by Condition
我創建了一些數據:
import pandas as pd
d = {'Time': ['01.10.2019, 09:56:52', '01.10.2019, 09:57:15', '02.10.2019 09:57:23', '02.10.2019 10:02:58', '02.10.2019 13:11:58', '02.10.2019 13:22:55']
,'Action': ['Opened', 'Closed', 'Opened', 'Closed', 'Opened', 'Closed']
,'Name': ['CTO', 'CTO', 'CFO', 'CFO', 'CFO' , 'CFO']}
df = pd.DataFrame(data=d)
Time Action Name
0 01.10.2019, 09:56:52 Opened CTO
1 01.10.2019, 09:57:15 Closed CTO
2 02.10.2019, 09:57:23 Opened CFO
3 02.10.2019, 10:02:58 Closed CFO
4 02.10.2019, 13:11:58 Opened CFO
5 02.10.2019, 13:22:55 Closed CFO
現在我想在時間 < 5 分鍾時刪除帶有條件的行,如果有多行同名,它應該刪除第一個“打開”操作和最后一個“關閉”之間的行,所以每次都是首先作為操作打開,如果有相同的名稱,則關閉。 我試過
mask = df.drop(df[pd.to_datetime(df["Time"]).diff().dt.seconds.gt(300)].index)
但這僅顯示了前三行。 我怎么能那樣做?
我的輸出應該是這樣的:
Time Action Name
0 02.10.2019, 09:57:23 Opened CFO
1 02.10.2019, 13:22:55 Closed CFO
因為前兩行不到 5 分鍾,而第三行和第四行與之前的名稱相同。 但如果日期是一天后,它應該是這樣的:
Time Action Name
2 02.10.2019, 09:57:23 Opened CFO
3 02.10.2019, 10:02:58 Closed CFO
4 03.10.2019, 13:11:58 Opened CFO
5 03.10.2019, 13:22:55 Closed CFO
也許不是世界上最干凈的方式,但它可以完成工作:
import pandas as pd
d = {'Time': ['01.10.2019, 09:56:52', '01.10.2019, 09:57:15', '02.10.2019 09:57:23', '02.10.2019 10:02:58',
'02.10.2019 13:11:58', '02.10.2019 13:22:55', '03.10.2019 14:20:44', '03.10.2019 14:30:44']
, 'Action': ['Opened', 'Closed', 'Opened', 'Closed', 'Opened', 'Closed', 'Opened', 'Closed']
, 'Name': ['CTO', 'CTO', 'CFO', 'CFO', 'CFO', 'CFO', 'CFO', 'CFO']}
df = pd.DataFrame(data=d)
df['Time'] = pd.to_datetime(df['Time'])
df.insert(1, 'Date', df['Time'].apply(lambda x: x.date()))
out = pd.DataFrame()
for name, group in df.groupby(['Name', 'Date']):
first_open_idx = group[group['Action'] == 'Opened']['Time'].first_valid_index()
last_close_idx = group[group['Action'] == 'Closed']['Time'].last_valid_index()
if first_open_idx is not None and last_close_idx is not None:
time_diff = group.loc[last_close_idx]['Time'] - group.loc[first_open_idx]['Time']
if time_diff.seconds > 300:
out = out.append(group[group.index.isin([first_open_idx, last_close_idx])])
print(out)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.