What I am trying to do is this... I have time series and I want to calculate rolling average, for n rows across multiple columns. What I did initially was to make another column that would contain average for each row and then do your standard rolling average for n rows. However, when I don't have values in some of the columns that throws off my calculations.
Example:
Col1 | Col2 | Col3 | Avg
10 | 20 | | 15
| 10 | | 10
10 | 15 | 20 | 15
Rolling average of Avg: 13.33
While it should be: 14.16
Here is the example that worked for me that has all the numbers...
Col1 | Col2 | Col3 | Avg
10 | 20 | 15 | 15
10 | 10 | 10 | 10
10 | 15 | 20 | 15
Rolling average of Avg: 13.33
While it should be: 13.33
What I can do is a manual loop... I also can add second column that would contain number of elements in each row.
But is there a better way to do it?
np.nanmean
will average everything in a multi-dimensional array.
np.nanmean(df.values)
14.166666666666666
Using this in a rolling 3 periods fashion, you could do this
pd.Series({df.index[i]: np.nanmean(df.iloc[i-2:i+1].values) for i in range(2, len(df))})
2 14.166667
dtype: float64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.