繁体   English   中英

计算带有日期时间行的 pandas DataFrame 的活动间隔

[英]Calculate activity interval for a pandas DataFrame with datetime rows

给定 Python 中的以下 pandas 数据帧:

使用日期和时间对象显示在不同时间打开和关闭 3 个灯泡。

                        date       ID_bulb     switch       using_time
1  2022-03-27 15:30:21+00:00             1        ON               NaT
2  2022-03-29 17:05:21+00:00             1       OFF   2 days 01:35:00
3  2022-04-07 17:05:21+00:00             1       OFF               NaT
4  2022-04-06 16:10:21+00:00             2        ON               NaT
5  2022-04-07 15:30:21+00:00             2       OFF   0 days 23:20:00
6  2022-02-15 23:10:21+00:00             3        ON               NaT
7  2022-02-16 02:10:21+00:00             3       OFF   0 days 04:00:00
8  2022-02-16 02:50:01+00:00             3        ON               NaT
9  2022-02-18 10:50:01+00:00             3       OFF   2 days 07:00:00
10 2022-02-04 19:40:21+00:00             4        ON               NaT
11 2022-02-06 15:35:21+00:00             4       OFF   1 days 19:55:00
12 2022-02-23 20:10:21+00:00             4        ON               NaT
13 2022-02-24 02:10:21+00:00             4       OFF   0 days 10:00:00
14 2022-03-14 12:10:21+00:00             5        ON               NaT
15 2022-03-15 00:10:21+00:00             5        ON               NaT
16 2022-03-16 05:10:21+00:00             5       OFF   0 days 05:00:00

我想添加一个名为cost_days的新列。 此列将仅包含变量using_timeNaT不同的行。 有关在start_timeend_time定义的夜间时段之间至少连续n小时内灯泡已打开多少次的信息。

生成的 DataFrame 示例:

举个例子:

add_costdays_column(df, 5, "22:00:00", "07:00:00")
                        date       ID_bulb     switch       using_time    cost_days
1  2022-03-27 15:30:21+00:00             1        ON               NaT         0
2  2022-03-29 17:05:21+00:00             1       OFF   2 days 01:35:00         2
3  2022-04-07 17:05:21+00:00             1       OFF               NaT         0
4  2022-04-06 16:10:21+00:00             2        ON               NaT         0
5  2022-04-07 15:30:21+00:00             2       OFF   0 days 23:20:00         1
6  2022-02-15 23:10:21+00:00             3        ON               NaT         0
7  2022-02-16 02:10:21+00:00             3       OFF   0 days 04:00:00         0
8  2022-02-16 02:50:01+00:00             3        ON               NaT         0
9  2022-02-18 10:50:01+00:00             3       OFF   2 days 08:00:00         2
10 2022-02-04 19:40:21+00:00             4        ON               NaT         0
11 2022-02-06 15:35:21+00:00             4       OFF   1 days 19:55:00         2
12 2022-02-23 20:10:21+00:00             4        ON               NaT         0
13 2022-02-24 02:10:21+00:00             4       OFF   0 days 06:00:00         0
14 2022-03-14 12:10:21+00:00             5        ON               NaT         0
15 2022-03-15 00:10:21+00:00             5        ON               NaT         0
16 2022-03-16 05:10:21+00:00             5       OFF   0 days 05:00:00         1

仅考虑using_time列中除NaT之外的值的行 OFF 之前的行 ON 用于打开灯泡。 这是为了简化问题,我稍后会根据具体情况进行调整。

利用:

import pandas as pd
df = pd.read_csv('TEST_STACK.csv')
df = df.iloc[:df[df['using_time'].notna()].index[-1]+1]
#solution
g = (df['using_time'].notna()).sort_index(ascending=False).cumsum()
g = (g-max(g)).abs()
import numpy as np
def rounder(x):
      v = pd.date_range(list(x)[-2], list(x)[-1], freq='1h')
      temp = pd.Series(v, index = v).between_time('22:00', '07:00')
      temp = len(temp)/9
      return np.floor(temp) if np.mod(temp, 1.0) < 6/9 else np.ceil(temp)/9
temp = df.groupby(g)['date'].apply(lambda x: rounder(x))
df.loc[df[df['using_time'].notna()].index, 'new col']=temp.values
df['new col'] = df['new col'].fillna(0)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM