简体   繁体   English

在熊猫中找到用户定义的窗口的平均值

[英]Find the average for user-defined window in pandas

I have a pandas dataframe that has raw heart rate data with the index of time (in seconds). 我有一个熊猫数据框,其中包含原始心率数据和时间索引(以秒为单位)。

I am trying to bin the data so that I can have the average of a user define window (eg 10s) - not a rolling average, just an average of 10s, then the 10s following, etc. 我正在尝试对数据进行分类,以便可以得到用户定义窗口的平均值(例如10s)-不是滚动平均值,只有10s的平均值,然后是10s,依此类推。

import pandas as pd

hr_raw = pd.read_csv('hr_data.csv', index_col='time')
print(hr_raw)

      heart_rate
time            
0.6        164.0
1.0        182.0
1.3        164.0
1.6        150.0
2.0        152.0
2.4        141.0
2.9        163.0
3.2        141.0
3.7        124.0
4.2        116.0
4.7        126.0
5.1        116.0
5.7        107.0

Using the example data above, I would like to be able to set a user defined window size (let's use 2 seconds) and produce a new dataframe that has index of 2sec increments and averages the 'heart_rate' values if the time falls into that window (and should continue to the end of the dataframe). 使用上面的示例数据,我希望能够设置用户定义的窗口大小(让我们使用2秒),并生成一个新的数据帧,该数据帧具有2秒增量的索引,并且如果时间落在该窗口中,则取平均“ heart_rate”值(并且应继续到数据帧的末尾)。

For example: 例如:

      heart_rate
time            
2.0        162.40
4.0        142.25
6.0        116.25

I can only seem to find methods to bin the data based on a predetermined number of bins (eg making a histogram) and this only returns the count/frequency. 我似乎只能找到基于预定数量的bin(例如制作直方图)对数据进行bin的方法,而这只会返回计数/频率。

thanks. 谢谢。

A groupby should do it. 一个groupby应该这样做。

df.groupby((df.index // 2 + 1) * 2).mean()

      heart_rate
time            
2.0       165.00
4.0       144.20
6.0       116.25

Note that the reason for the slight difference between our answers is that the upper bound is excluded. 请注意,我们的答案之间存在细微差异的原因是排除了上限。 That means, a reading taken at 2.0s will be considered for the 4.0s time interval. 这意味着,在4.0s的时间间隔内将考虑以2.0s的读数。 This is how it is usually done, a similar solution with the TimeGrouper will yield the same result. 这通常是这样做的,使用TimeGrouper的类似解决方案将产生相同的结果。

Like coldspeed pointed out, 2s will be considered in 4s, however, if you need it in 2x bucket, you can 就像Coldspeed指出的那样,将2s视为4s,但是,如果您需要2x bucket,则可以

In [1038]: df.groupby(np.ceil(df.index/2)*2).mean()
Out[1038]:
      heart_rate
time
2.0       162.40
4.0       142.25
6.0       116.25

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM