I'm wondering if there is a function/package (package: zoo
?) that will allow me to calculate daily (or other) means of a time series for a second series of values. There are several questions on SO that deal with the creation of eg daily means, but none that allow grouping by an independent series.
As of now, I have been doing this in 2+ steps by first calculating means via the aggregate
function, followed by a match
to a full sequence of values. The following example is a typical situation for me where there are some days that do not contain any values:
set.seed(1)
n <- 500
x <- cumsum(runif(n, min=99360*0.1, max=99360*2))
datetime <- as.POSIXlt(x, origin="2000-01-01", tz="GMT")
y <- cumsum(runif(n, min=-1, max=1))
df <- data.frame(datetime, y)
df <- df[-sample(n, n*0.2),] #remove 20%
plot(y ~ datetime, df, t="l")
#calculate daily means
df$date <- as.Date(df$datetime)
daymean <- aggregate(y ~ date, data=df, mean)
#create daily means ts including all possible dates
date.ran <- range(df$date)
df2 <- data.frame(date=seq(date.ran[1], date.ran[2], by="days"), y=NaN)
MATCH <- match(daymean$date, df2$date)
df2$y[MATCH] <- daymean$y
plot(y ~ datetime, df, cex=0.5, pch=20)
lines(as.POSIXlt(df2$date), df2$y, t="o", col=rgb(1,0,0,0.5))
legend("topright", legend=c("Orig.", "daily mean"), col=c(1,rgb(1,0,0,0.5)), lty=c(NA, 1), pch=c(20, 1))
set.seed(1)
n <- 500
x <- cumsum(runif(n, min=99360*0.1, max=99360*2))
datetime <- as.POSIXlt(x, origin="2000-01-01", tz="GMT")
y <- cumsum(runif(n, min=-1, max=1))
#df <- data.frame(datetime, y)
#df <- df[-sample(n, n*0.2),] #remove 20%
#You should set the values to NA instead of removing them
df <- data.frame(datetime, y)
df[sample(n, n*0.2), "y"] <- NA#remove 20%
library(xts)
myxts <- as.xts(df$y,order.by=df$datetime)
ep <- endpoints(myxts,'days')
daymeans <- period.apply(myxts, INDEX=ep, FUN=mean, na.rm=TRUE)
plot(myxts,cex=0.5, pch=20, type="p")
lines(daymeans)
points(daymeans, col="red")
However, this calculates POSIXct times, which you may want to convert to dates or round to noon for plotting.
1) zoo Here is how it would be done with zoo. z2
, the end result, is the series of means, one per day. We read columns 1 and 2 of df
into a zoo object z and create a grid, g
, which is a zero width zoo object of dates. Then compute the means, m
, and merge the means and the grid.
library(zoo)
z <- read.zoo(df[1:2], FUN = identity)
m <- aggregate(z, as.Date, mean)
g <- zoo(, seq(start(m), end(m), by = "day"))
z2 <- merge(m, g, fill = NaN)
coredata(z2)
is the data and time(z2)
is the dates.
2) zoo & magrittr Another way of expressing this, using zoo and the magrittr package, is with this pipeline:
library(zoo)
library(magrittr)
df[1:2] %>%
read.zoo(FUN = identity) %>%
aggregate(as.Date, mean) %>%
function(x) merge(x, zoo(, seq(start(x), end(x), by = "day")), fill = NaN)
REVISED. New understanding is that we wish to create an object like df2
. Added magrittr approach. Some minor impovements.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.