简体   繁体   中英

Moving average that takes into account NAs in value and gaps in available dates

I'm working with a time series that spans 2008 to 2015, but am restricting attention to the months of March to August in each year. To further complicate the matter, some values have been marked NA.

Here's what a subset (not sorted by date) of the df looks like:

  Date       Value   Site
1 2008-08-20     NA  Kenya
2 2008-08-29 12.954  Kenya
3 2008-08-18 29.972  Kenya
4 2008-08-16  5.080  Kenya
5 2009-04-21  3.048  Kenya
6 2009-04-22 12.954  Kenya

Probably an unimportant detail since subsetting is pretty straightforward, but in the interest of clarifying the purpose of the Site column, I'll mention there are five sites in all with time series data over the same span.

I want to add a column Value10 that gives a 10-day moving average. I've found this can be easily accomplished using one of several packages such as zoo or TTR , but I want the moving average to be sensitive to the date and site so that it

  • generates an NA for the day if any one of the previous 10 values produces an NA
  • generates an NA for the day when the previous 10 values include a jump in the Date , eg going from August of 2008 to March of 2009.
  • is sensitive to which Site 's data it's acting on

The data in the question was replicated for Congo and we use a width of 2 instead of 10 so we can run this without having a trivial result of all NA:

# data for DF

Lines <- "  Date       Value   Site
2008-08-20     NA  Kenya
2008-08-29 12.954  Kenya
2008-08-18 29.972  Kenya
2008-08-16  5.080  Kenya
2009-04-21  3.048  Kenya
2009-04-22 12.954  Kenya
2008-08-20     NA  Congo
2008-08-29 12.954  Congo
2008-08-18 29.972  Congo
2008-08-16  5.080  Congo
2009-04-21  3.048  Congo
2009-04-22 12.954  Congo"

# set up DF, convert Date column to "Date" class

DF <- read.table(text = Lines, header = TRUE)
DF$Date <- as.Date(DF$Date)

Sort the rows and use ave to perform the rolling mean by Site and year/month:

# sort rows
o <- order(DF$Site, DF$Date)
DF <- DF[o, ]

# perform rolling mean 
library(zoo)
# w <- 10
w <- 2
roll <- function(x) rollapplyr(c(rep(NA, w-1), x), w, mean)
DF$mean <- ave(DF$Value, DF$Site, as.yearmon(DF$Date), FUN = roll)

This gives:

> DF
         Date  Value  Site   mean
10 2008-08-16  5.080 Congo     NA
9  2008-08-18 29.972 Congo 17.526
7  2008-08-20     NA Congo     NA
8  2008-08-29 12.954 Congo     NA
11 2009-04-21  3.048 Congo     NA
12 2009-04-22 12.954 Congo  8.001
4  2008-08-16  5.080 Kenya     NA
3  2008-08-18 29.972 Kenya 17.526
1  2008-08-20     NA Kenya     NA
2  2008-08-29 12.954 Kenya     NA
5  2009-04-21  3.048 Kenya     NA
6  2009-04-22 12.954 Kenya  8.001

UPDATES Rearranged presentation and added changed ave line to use yearmon .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM