filter out rolling mean results with limited data

Question

I am trying to calculate the rolling mean of a time series. I have no problems with the calculation, however, looking at the results, there are locations along the time series where the rolling mean occurs based on one or two values surrounded by a long series of missing values. I would like the rolling average to only occur when greater than 50% of the data within the width of the time frame for with the rolling average is performed. If less than 50% of the data is available, then the result for that index should be NaN .

I wrote some example code to hopefully demonstrate what I am trying to accomplish.

#Create example data
set.seed(12)
dat1=runif(20,min=0,max=10)
dat2=dat1
ind=which(dat2 %in% sample(dat2,5))
#in this case ind=c(4, 7, 8, 13, 16)
dat2[ind]=NA

dat3=dat1
ind2=which(dat3 %in% sample(dat3,12))
#in this case ind2=c(2, 5, 7, 8, 9, 10, 11, 12, 13, 14, 17, 18)
dat3[ind2]=NA

#create a time series
now <- Sys.time()
tseq <- seq(from = now, length.out = 20, by = "mins")

#data in zoo format
dat1=zoo(dat1,tseq)
dat2=zoo(dat2,tseq)
dat3=zoo(dat3,tseq)

#rolling mean using roll apply
dat1rollmean=rollapply(dat1,width=5,align='center',FUN=function(x) mean(x,na.rm=T))
dat2rollmean=rollapply(dat2,width=5,align='center',FUN=function(x) mean(x,na.rm=T))
dat3rollmean=rollapply(dat3,width=5,align='center',FUN=function(x) mean(x,na.rm=T))

#doesn't work
dat3newrollmean=rollmean(dat3,5)

#desired rolling mean result
dat2des=dat2rollmean
dat2des[4]=NaN

dat3des=dat3rollmean
dat3des[c(4:14)]=NaN

In this example, dat1 is a complete dataset for which my rollapply (width of 5) function works well, dat2 and dat3 have different levels of missing data. I would want my result in this case to replace any points in which the rollapply is performed on less than 2 points of data with NaN . That would be index 4 for dat2rollmean and indexes 4-14 for dat3rollmean . How would I write a function to find these instances of insufficient data and replace the resulting rolling mean result with NaN ?

Answer 1

Use Mean defined below:

Mean <- function(x) if (sum(is.na(x)) < length(x) / 2) mean(x, na.rm = TRUE) else NaN

res1 <- rollapply(dat1, 5, Mean)
identical(res1, dat1rollmean)
## [1] TRUE

res2 <- rollapply(dat2, 5, Mean)
identical(res2, dat2des)
## [1] TRUE

res3 <- rollapply(dat3, 5, Mean)
identical(res3, dat3des)
## [1] TRUE

filter out rolling mean results with limited data

Question

1 answers

solution1
1 ACCPTED 2020-12-01 19:20:01

filter out rolling mean results with limited data

Question

1 answers

solution1 1 ACCPTED 2020-12-01 19:20:01

solution1
1 ACCPTED 2020-12-01 19:20:01