简体   繁体   中英

filter out rolling mean results with limited data

I am trying to calculate the rolling mean of a time series. I have no problems with the calculation, however, looking at the results, there are locations along the time series where the rolling mean occurs based on one or two values surrounded by a long series of missing values. I would like the rolling average to only occur when greater than 50% of the data within the width of the time frame for with the rolling average is performed. If less than 50% of the data is available, then the result for that index should be NaN .

I wrote some example code to hopefully demonstrate what I am trying to accomplish.

#Create example data
set.seed(12)
dat1=runif(20,min=0,max=10)
dat2=dat1
ind=which(dat2 %in% sample(dat2,5))
#in this case ind=c(4, 7, 8, 13, 16)
dat2[ind]=NA

dat3=dat1
ind2=which(dat3 %in% sample(dat3,12))
#in this case ind2=c(2, 5, 7, 8, 9, 10, 11, 12, 13, 14, 17, 18)
dat3[ind2]=NA

#create a time series
now <- Sys.time()
tseq <- seq(from = now, length.out = 20, by = "mins")

#data in zoo format
dat1=zoo(dat1,tseq)
dat2=zoo(dat2,tseq)
dat3=zoo(dat3,tseq)

#rolling mean using roll apply
dat1rollmean=rollapply(dat1,width=5,align='center',FUN=function(x) mean(x,na.rm=T))
dat2rollmean=rollapply(dat2,width=5,align='center',FUN=function(x) mean(x,na.rm=T))
dat3rollmean=rollapply(dat3,width=5,align='center',FUN=function(x) mean(x,na.rm=T))

#doesn't work
dat3newrollmean=rollmean(dat3,5)

#desired rolling mean result
dat2des=dat2rollmean
dat2des[4]=NaN

dat3des=dat3rollmean
dat3des[c(4:14)]=NaN

In this example, dat1 is a complete dataset for which my rollapply (width of 5) function works well, dat2 and dat3 have different levels of missing data. I would want my result in this case to replace any points in which the rollapply is performed on less than 2 points of data with NaN . That would be index 4 for dat2rollmean and indexes 4-14 for dat3rollmean . How would I write a function to find these instances of insufficient data and replace the resulting rolling mean result with NaN ?

Use Mean defined below:

Mean <- function(x) if (sum(is.na(x)) < length(x) / 2) mean(x, na.rm = TRUE) else NaN

res1 <- rollapply(dat1, 5, Mean)
identical(res1, dat1rollmean)
## [1] TRUE

res2 <- rollapply(dat2, 5, Mean)
identical(res2, dat2des)
## [1] TRUE

res3 <- rollapply(dat3, 5, Mean)
identical(res3, dat3des)
## [1] TRUE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM