[英]pandas groupby timeseries data according to function result
I am analyzing power systems time series data, and I am trying to find the contiguous data points that go beyond a certain threshold value. 我正在分析电力系统时间序列数据,我试图找到超过某个阈值的连续数据点。
I am currently using excel formula row by row manually to do this, but I as I am trying to search more efficient methods I realized that this could be done in python pandas groupby function. 我目前正在手动逐行使用excel公式来执行此操作,但我正在尝试搜索更有效的方法,我意识到这可以在python pandas groupby函数中完成。
However, as far as the examples that I have read, the groupby function only groups the rows if they have the same label. 但是,就我已阅读的示例而言,groupby函数仅在行具有相同标签时才对其进行分组。 What I would like to do is to pass a certain function to groupby that could check if the value => 3, and then group those values, indexed by its starting and end time of breaching the threshold value => 3.
我想要做的是将某个函数传递给groupby,该函数可以检查值=> 3,然后对这些值进行分组,并按照其超出阈值=> 3的开始和结束时间进行索引。
Input: 输入:
+-------+---------+------+
| Index | Time | Value|
+-------+---------+------+
| 0 | 00:00:01| 3 |
| 1 | 00:00:02| 4 |
| 2 | 00:00:03| 5 |
| 3 | 00:00:04| 2 |
| 4 | 00:00:05| 6 |
| 5 | 00:00:06| 7 |
| 6 | 00:00:07| 1 |
| 7 | 00:00:08| 9 |
+-------+---------+------+
Output: 输出:
+-------+-----------+----------+--------+
| Index | TimeStart | TimeEnd | Value |
+-------+-----------+----------+--------+
| 0 | 00:00:01 | 00:00:03 | 3,4,5 |
| 1 | 00:00:05 | 00:00:06 | 6,7 |
| 2 | 00:00:08 | 00:00:08 | 9 |
+-------+-----------+----------+--------+
3
3
的面具 3
3
组的累积总和 df
by the mask, then groupby
df
,然后groupby
agg
to pass several functions at once agg
一次传递几个函数 mask = df.Value.lt(3)
grp = mask.cumsum()
d1 = df[~mask].groupby(grp[~mask]).agg(dict(
Time=['first', 'last'],
Value=lambda x: ','.join(map(str, x))
))
d1.columns = ['TimeStart', 'TimeEnd', 'Value']
d1
TimeStart TimeEnd Value
Value
0 00:00:01 00:00:03 3,4,5
1 00:00:05 00:00:06 6,7
2 00:00:08 00:00:08 9
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.