依靠熊猫的滚动时间窗口

Question

I'm trying to return a count on a time window about a (moving) fixed point.我正在尝试返回关于（移动）固定点的时间窗口的计数。

It's an attempt to understand the condition of an instrument at any time, as a function of usage prior to it.它试图随时了解仪器的状况，作为之前使用的函数。

So if the instrument is used at 12.05pm, 12.10, 12.15, 12.30, 12.40 and 1pm, the usage counts would be:因此，如果仪器在 12.05pm、12.10、12.15、12.30、12.40 和 1pm 使用，则使用计数将为：

12.05 -> 1 (once in the last hour) 12.05 -> 1（最后一小时一次）

12.10 -> 2 12.10 -> 2

12.15 -> 3 12.15 -> 3

12.30 -> 4 12.30 -> 4

12.40 -> 5 12.40 -> 5

1.00 -> 6 1.00 -> 6

... but then lets say usage resumes at 1.06: 1.06 -> 6 this doesn't increase the count, as the first run is over an hour ago. ...但是可以说使用在 1.06 恢复：1.06 -> 6 这不会增加计数，因为第一次运行是一个多小时前。

How can I calculate this count and append it as a column?如何计算此计数并将其附加为一列？

It feels like this is an groupby/aggregate/count using possibly timedeltas in a lambda function, but I don't know where to start past that.感觉这是一个 groupby/aggregate/count，可能在 lambda 函数中使用 timedeltas，但我不知道从哪里开始。

I'd like to be able to play with the time window too, so not just the past hour, but the hour surrounding an instance ie + and -30 minutes.我也希望能够使用时间窗口，所以不仅仅是过去的一小时，而是围绕实例的小时，即 + 和 -30 分钟。

The following code gives a starting dataframe:以下代码给出了一个起始数据帧：

s = pd.Series(pd.date_range('2020-1-1', periods=8000, freq='250s'))
df = pd.DataFrame({'Run time': s})
df_sample = df.sample(6000)
df_sample = df_sample.sort_index()

The best help i found (and to be fair i can usually hack together from the logic) was this Distinct count on a rolling time window but i've not managed this time.我找到的最好的帮助（公平地说，我通常可以从逻辑中破解）是滚动时间窗口上的这个Distinct 计数，但这次我没有管理。

Thanks谢谢

Answer 1

I've done something similar previously with the DataFrame.rolling function: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html我以前用DataFrame.rolling函数做过类似的DataFrame.rolling ： https : DataFrame.rolling

So for your dataset, first you need to update the index to the datetime field, then you can preform the analysis you need, so continuing on from your code:因此，对于您的数据集，首先您需要将索引更新为日期时间字段，然后您可以执行所需的分析，因此继续您的代码：

s = pd.Series(pd.date_range('2020-1-1', periods=8000, freq='250s'))
df = pd.DataFrame({'Run time': s})
df_sample = df.sample(6000)
df_sample = df_sample.sort_index()

# Create a value we can count
df_sample('Occurrences') = 1

# Set the index to the datetime element
df_sample = df_sample.set_index('Run time')

# Use Pandas rolling method, 3600s = 1 Hour
df_sample['Occurrences in Last Hour'] = df_sample['Occurrences'].rolling('3600s').sum()

df_sample.head(15)

                     Occurrences  Occurrences in Last Hour
Run time                                                   
2020-01-01 00:00:00            1                       1.0
2020-01-01 00:04:10            1                       2.0
2020-01-01 00:08:20            1                       3.0
2020-01-01 00:12:30            1                       4.0
2020-01-01 00:16:40            1                       5.0
2020-01-01 00:25:00            1                       6.0
2020-01-01 00:29:10            1                       7.0
2020-01-01 00:37:30            1                       8.0
2020-01-01 00:50:00            1                       9.0
2020-01-01 00:54:10            1                      10.0
2020-01-01 00:58:20            1                      11.0
2020-01-01 01:02:30            1                      11.0
2020-01-01 01:06:40            1                      11.0
2020-01-01 01:15:00            1                      10.0
2020-01-01 01:19:10            1                      10.0

You need to set the index to a datetime element to utilised the time base window, otherwise you can only use integer values corresponding to the number of rows.您需要将索引设置为日期时间元素以利用时基窗口，否则您只能使用与行数相对应的整数值。

依靠熊猫的滚动时间窗口

问题描述

1 个解决方案

解决方案1
3 已采纳 2020-04-01 11:01:55

依靠熊猫的滚动时间窗口

问题描述

1 个解决方案

解决方案1 3 已采纳 2020-04-01 11:01:55

解决方案1
3 已采纳 2020-04-01 11:01:55