在熊猫中找到用户定义的窗口的平均值

Question

I have a pandas dataframe that has raw heart rate data with the index of time (in seconds). 我有一个熊猫数据框，其中包含原始心率数据和时间索引（以秒为单位）。

I am trying to bin the data so that I can have the average of a user define window (eg 10s) - not a rolling average, just an average of 10s, then the 10s following, etc. 我正在尝试对数据进行分类，以便可以得到用户定义窗口的平均值（例如10s）-不是滚动平均值，只有10s的平均值，然后是10s，依此类推。

import pandas as pd

hr_raw = pd.read_csv('hr_data.csv', index_col='time')
print(hr_raw)

      heart_rate
time            
0.6        164.0
1.0        182.0
1.3        164.0
1.6        150.0
2.0        152.0
2.4        141.0
2.9        163.0
3.2        141.0
3.7        124.0
4.2        116.0
4.7        126.0
5.1        116.0
5.7        107.0

Using the example data above, I would like to be able to set a user defined window size (let's use 2 seconds) and produce a new dataframe that has index of 2sec increments and averages the 'heart_rate' values if the time falls into that window (and should continue to the end of the dataframe). 使用上面的示例数据，我希望能够设置用户定义的窗口大小（让我们使用2秒），并生成一个新的数据帧，该数据帧具有2秒增量的索引，并且如果时间落在该窗口中，则取平均“ heart_rate”值（并且应继续到数据帧的末尾）。

For example: 例如：

      heart_rate
time            
2.0        162.40
4.0        142.25
6.0        116.25

I can only seem to find methods to bin the data based on a predetermined number of bins (eg making a histogram) and this only returns the count/frequency. 我似乎只能找到基于预定数量的bin（例如制作直方图）对数据进行bin的方法，而这只会返回计数/频率。

thanks. 谢谢。

Answer 1

A groupby should do it. 一个groupby应该这样做。

df.groupby((df.index // 2 + 1) * 2).mean()

      heart_rate
time            
2.0       165.00
4.0       144.20
6.0       116.25

Note that the reason for the slight difference between our answers is that the upper bound is excluded. 请注意，我们的答案之间存在细微差异的原因是排除了上限。 That means, a reading taken at 2.0s will be considered for the 4.0s time interval. 这意味着，在4.0s的时间间隔内将考虑以2.0s的读数。 This is how it is usually done, a similar solution with the TimeGrouper will yield the same result. 这通常是这样做的，使用TimeGrouper的类似解决方案将产生相同的结果。

Answer 2

Like coldspeed pointed out, 2s will be considered in 4s, however, if you need it in 2x bucket, you can 就像Coldspeed指出的那样，将2s视为4s，但是，如果您需要2x bucket，则可以

In [1038]: df.groupby(np.ceil(df.index/2)*2).mean()
Out[1038]:
      heart_rate
time
2.0       162.40
4.0       142.25
6.0       116.25

在熊猫中找到用户定义的窗口的平均值

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-09-18 03:50:56

解决方案2
1 2017-09-18 05:03:19

在熊猫中找到用户定义的窗口的平均值

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-09-18 03:50:56

解决方案2 1 2017-09-18 05:03:19

解决方案1
1 已采纳 2017-09-18 03:50:56

解决方案2
1 2017-09-18 05:03:19