[英]Calculate activity interval for a pandas DataFrame with datetime rows
给定 Python 中的以下 pandas 数据帧:
使用日期和时间对象显示在不同时间打开和关闭 3 个灯泡。
date ID_bulb switch using_time
1 2022-03-27 15:30:21+00:00 1 ON NaT
2 2022-03-29 17:05:21+00:00 1 OFF 2 days 01:35:00
3 2022-04-07 17:05:21+00:00 1 OFF NaT
4 2022-04-06 16:10:21+00:00 2 ON NaT
5 2022-04-07 15:30:21+00:00 2 OFF 0 days 23:20:00
6 2022-02-15 23:10:21+00:00 3 ON NaT
7 2022-02-16 02:10:21+00:00 3 OFF 0 days 04:00:00
8 2022-02-16 02:50:01+00:00 3 ON NaT
9 2022-02-18 10:50:01+00:00 3 OFF 2 days 07:00:00
10 2022-02-04 19:40:21+00:00 4 ON NaT
11 2022-02-06 15:35:21+00:00 4 OFF 1 days 19:55:00
12 2022-02-23 20:10:21+00:00 4 ON NaT
13 2022-02-24 02:10:21+00:00 4 OFF 0 days 10:00:00
14 2022-03-14 12:10:21+00:00 5 ON NaT
15 2022-03-15 00:10:21+00:00 5 ON NaT
16 2022-03-16 05:10:21+00:00 5 OFF 0 days 05:00:00
我想添加一个名为cost_days
的新列。 此列将仅包含变量using_time
与NaT
不同的行。 有关在start_time
到end_time
定义的夜间时段之间至少连续n
小时内灯泡已打开多少次的信息。
生成的 DataFrame 示例:
举个例子:
add_costdays_column(df, 5, "22:00:00", "07:00:00")
date ID_bulb switch using_time cost_days
1 2022-03-27 15:30:21+00:00 1 ON NaT 0
2 2022-03-29 17:05:21+00:00 1 OFF 2 days 01:35:00 2
3 2022-04-07 17:05:21+00:00 1 OFF NaT 0
4 2022-04-06 16:10:21+00:00 2 ON NaT 0
5 2022-04-07 15:30:21+00:00 2 OFF 0 days 23:20:00 1
6 2022-02-15 23:10:21+00:00 3 ON NaT 0
7 2022-02-16 02:10:21+00:00 3 OFF 0 days 04:00:00 0
8 2022-02-16 02:50:01+00:00 3 ON NaT 0
9 2022-02-18 10:50:01+00:00 3 OFF 2 days 08:00:00 2
10 2022-02-04 19:40:21+00:00 4 ON NaT 0
11 2022-02-06 15:35:21+00:00 4 OFF 1 days 19:55:00 2
12 2022-02-23 20:10:21+00:00 4 ON NaT 0
13 2022-02-24 02:10:21+00:00 4 OFF 0 days 06:00:00 0
14 2022-03-14 12:10:21+00:00 5 ON NaT 0
15 2022-03-15 00:10:21+00:00 5 ON NaT 0
16 2022-03-16 05:10:21+00:00 5 OFF 0 days 05:00:00 1
仅考虑using_time
列中除NaT
之外的值的行 OFF 之前的行 ON 用于打开灯泡。 这是为了简化问题,我稍后会根据具体情况进行调整。
利用:
import pandas as pd
df = pd.read_csv('TEST_STACK.csv')
df = df.iloc[:df[df['using_time'].notna()].index[-1]+1]
#solution
g = (df['using_time'].notna()).sort_index(ascending=False).cumsum()
g = (g-max(g)).abs()
import numpy as np
def rounder(x):
v = pd.date_range(list(x)[-2], list(x)[-1], freq='1h')
temp = pd.Series(v, index = v).between_time('22:00', '07:00')
temp = len(temp)/9
return np.floor(temp) if np.mod(temp, 1.0) < 6/9 else np.ceil(temp)/9
temp = df.groupby(g)['date'].apply(lambda x: rounder(x))
df.loc[df[df['using_time'].notna()].index, 'new col']=temp.values
df['new col'] = df['new col'].fillna(0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.