简体   繁体   中英

More efficient way to calculate a daily means time series which includes dates not found in the original series (in R)?

I'm wondering if there is a function/package (package: zoo ?) that will allow me to calculate daily (or other) means of a time series for a second series of values. There are several questions on SO that deal with the creation of eg daily means, but none that allow grouping by an independent series.

As of now, I have been doing this in 2+ steps by first calculating means via the aggregate function, followed by a match to a full sequence of values. The following example is a typical situation for me where there are some days that do not contain any values:

set.seed(1)
n <- 500
x <- cumsum(runif(n, min=99360*0.1, max=99360*2))
datetime <- as.POSIXlt(x, origin="2000-01-01", tz="GMT")
y <- cumsum(runif(n, min=-1, max=1))
df <- data.frame(datetime, y)
df <- df[-sample(n, n*0.2),] #remove 20%
plot(y ~ datetime, df, t="l")

#calculate daily means
df$date <- as.Date(df$datetime)
daymean <- aggregate(y ~ date, data=df, mean)

#create daily means ts including all possible dates
date.ran <- range(df$date)
df2 <- data.frame(date=seq(date.ran[1], date.ran[2], by="days"), y=NaN)
MATCH <- match(daymean$date, df2$date)
df2$y[MATCH] <- daymean$y

plot(y ~ datetime, df, cex=0.5, pch=20)
lines(as.POSIXlt(df2$date), df2$y, t="o", col=rgb(1,0,0,0.5))
legend("topright", legend=c("Orig.", "daily mean"), col=c(1,rgb(1,0,0,0.5)), lty=c(NA, 1), pch=c(20, 1))

在此处输入图片说明

set.seed(1)
n <- 500
x <- cumsum(runif(n, min=99360*0.1, max=99360*2))
datetime <- as.POSIXlt(x, origin="2000-01-01", tz="GMT")
y <- cumsum(runif(n, min=-1, max=1))
#df <- data.frame(datetime, y)
#df <- df[-sample(n, n*0.2),] #remove 20%

#You should set the values to NA instead of removing them
df <- data.frame(datetime, y)
df[sample(n, n*0.2), "y"] <- NA#remove 20%

library(xts)
myxts <- as.xts(df$y,order.by=df$datetime)
ep <- endpoints(myxts,'days')
daymeans <- period.apply(myxts, INDEX=ep, FUN=mean, na.rm=TRUE)

plot(myxts,cex=0.5, pch=20, type="p")
lines(daymeans)
points(daymeans, col="red")

在此处输入图片说明

However, this calculates POSIXct times, which you may want to convert to dates or round to noon for plotting.

1) zoo Here is how it would be done with zoo. z2 , the end result, is the series of means, one per day. We read columns 1 and 2 of df into a zoo object z and create a grid, g , which is a zero width zoo object of dates. Then compute the means, m , and merge the means and the grid.

library(zoo)
z <- read.zoo(df[1:2], FUN = identity)

m <- aggregate(z, as.Date, mean)
g <- zoo(, seq(start(m), end(m), by = "day"))
z2 <- merge(m, g, fill = NaN)

coredata(z2) is the data and time(z2) is the dates.

2) zoo & magrittr Another way of expressing this, using zoo and the magrittr package, is with this pipeline:

library(zoo)
library(magrittr)

df[1:2] %>%
    read.zoo(FUN = identity) %>%
    aggregate(as.Date, mean) %>%
    function(x) merge(x, zoo(, seq(start(x), end(x), by = "day")), fill = NaN)

REVISED. New understanding is that we wish to create an object like df2 . Added magrittr approach. Some minor impovements.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM