根据功能结果，pandas groupby timeseries数据

Question

I am analyzing power systems time series data, and I am trying to find the contiguous data points that go beyond a certain threshold value. 我正在分析电力系统时间序列数据，我试图找到超过某个阈值的连续数据点。

I am currently using excel formula row by row manually to do this, but I as I am trying to search more efficient methods I realized that this could be done in python pandas groupby function. 我目前正在手动逐行使用excel公式来执行此操作，但我正在尝试搜索更有效的方法，我意识到这可以在python pandas groupby函数中完成。

However, as far as the examples that I have read, the groupby function only groups the rows if they have the same label. 但是，就我已阅读的示例而言，groupby函数仅在行具有相同标签时才对其进行分组。 What I would like to do is to pass a certain function to groupby that could check if the value => 3, and then group those values, indexed by its starting and end time of breaching the threshold value => 3. 我想要做的是将某个函数传递给groupby，该函数可以检查值=> 3，然后对这些值进行分组，并按照其超出阈值=> 3的开始和结束时间进行索引。

Input: 输入：

+-------+---------+------+
| Index |  Time   | Value|
+-------+---------+------+
|     0 | 00:00:01|   3  |
|     1 | 00:00:02|   4  |
|     2 | 00:00:03|   5  |
|     3 | 00:00:04|   2  |
|     4 | 00:00:05|   6  |
|     5 | 00:00:06|   7  |
|     6 | 00:00:07|   1  |
|     7 | 00:00:08|   9  |
+-------+---------+------+

Output: 输出：

+-------+-----------+----------+--------+
| Index | TimeStart | TimeEnd  | Value  |
+-------+-----------+----------+--------+
|     0 | 00:00:01  | 00:00:03 |  3,4,5 |
|     1 | 00:00:05  | 00:00:06 |  6,7   |
|     2 | 00:00:08  | 00:00:08 |  9     |
+-------+-----------+----------+--------+

Answer 1

Create a mask where less than 3 创建一个小于3的面具
Cumulative sum to create groups where greater than or equal to 3 用于创建大于或等于3组的累积总和
filter the df by the mask, then groupby 通过掩码过滤df ，然后groupby
Use agg to pass several functions at once 使用agg一次传递几个函数
Rename columns 重命名列

mask = df.Value.lt(3)
grp = mask.cumsum()

d1 = df[~mask].groupby(grp[~mask]).agg(dict(
    Time=['first', 'last'],
    Value=lambda x: ','.join(map(str, x))
))

d1.columns = ['TimeStart', 'TimeEnd', 'Value']

d1

      TimeStart   TimeEnd  Value
Value                           
0      00:00:01  00:00:03  3,4,5
1      00:00:05  00:00:06    6,7
2      00:00:08  00:00:08      9

根据功能结果，pandas groupby timeseries数据

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-10-29 00:55:56

根据功能结果，pandas groupby timeseries数据

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-10-29 00:55:56

解决方案1
3 已采纳 2017-10-29 00:55:56