[英]Pandas: Calculate the average over all of the columns for n rolling rows at a time
What I am trying to do is this... I have time series and I want to calculate rolling average, for n rows across multiple columns. 我正在尝试做的是...我有时间序列,我想计算多列中n行的滚动平均值。 What I did initially was to make another column that would contain average for each row and then do your standard rolling average for n rows.
我最初要做的是制作另一列,每列包含平均值,然后对n行进行标准滚动平均值计算。 However, when I don't have values in some of the columns that throws off my calculations.
但是,当我在某些列中没有值时,就无法进行计算。
Example: 例:
Col1 | Col2 | Col3 | Avg
10 | 20 | | 15
| 10 | | 10
10 | 15 | 20 | 15
Rolling average of Avg: 13.33
平均滚动平均值:
13.33
While it should be: 14.16
应该是:
14.16
Here is the example that worked for me that has all the numbers... 这是为我工作的示例,其中包含所有数字...
Col1 | Col2 | Col3 | Avg
10 | 20 | 15 | 15
10 | 10 | 10 | 10
10 | 15 | 20 | 15
Rolling average of Avg: 13.33
平均滚动平均值:
13.33
While it should be: 13.33
应该是:
13.33
What I can do is a manual loop... I also can add second column that would contain number of elements in each row. 我可以做的是手动循环...我还可以添加第二列,该列将在每行中包含元素数量。
But is there a better way to do it? 但是有更好的方法吗?
np.nanmean
will average everything in a multi-dimensional array. np.nanmean
将对多维数组中的所有内容np.nanmean
平均。
np.nanmean(df.values)
14.166666666666666
Using this in a rolling 3 periods fashion, you could do this 滚动3个周期使用此方法,您可以执行此操作
pd.Series({df.index[i]: np.nanmean(df.iloc[i-2:i+1].values) for i in range(2, len(df))})
2 14.166667
dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.