一种更有效的方法来计算每日平均时间序列，其中包括原始序列（在R中）找不到的日期？

Question

I'm wondering if there is a function/package (package: zoo ?) that will allow me to calculate daily (or other) means of a time series for a second series of values. 我想知道是否有一个函数/程序包（程序包： zoo吗？）可以让我为第二个系列的值计算每天（或其他）时间序列的平均值。 There are several questions on SO that deal with the creation of eg daily means, but none that allow grouping by an independent series. 关于SO的问题有几个，涉及日常工作的创建，但没有一个问题允许按独立系列进行分组。

As of now, I have been doing this in 2+ steps by first calculating means via the aggregate function, followed by a match to a full sequence of values. 到目前为止，我已经通过首先通过aggregate函数计算的方式，然后match完整的值序列进行match ，以2个多步骤进行了此操作。 The following example is a typical situation for me where there are some days that do not contain any values: 以下示例是我的典型情况，其中有些日子不包含任何值：

set.seed(1)
n <- 500
x <- cumsum(runif(n, min=99360*0.1, max=99360*2))
datetime <- as.POSIXlt(x, origin="2000-01-01", tz="GMT")
y <- cumsum(runif(n, min=-1, max=1))
df <- data.frame(datetime, y)
df <- df[-sample(n, n*0.2),] #remove 20%
plot(y ~ datetime, df, t="l")

#calculate daily means
df$date <- as.Date(df$datetime)
daymean <- aggregate(y ~ date, data=df, mean)

#create daily means ts including all possible dates
date.ran <- range(df$date)
df2 <- data.frame(date=seq(date.ran[1], date.ran[2], by="days"), y=NaN)
MATCH <- match(daymean$date, df2$date)
df2$y[MATCH] <- daymean$y

plot(y ~ datetime, df, cex=0.5, pch=20)
lines(as.POSIXlt(df2$date), df2$y, t="o", col=rgb(1,0,0,0.5))
legend("topright", legend=c("Orig.", "daily mean"), col=c(1,rgb(1,0,0,0.5)), lty=c(NA, 1), pch=c(20, 1))

在此处输入图片说明

Answer 1

set.seed(1)
n <- 500
x <- cumsum(runif(n, min=99360*0.1, max=99360*2))
datetime <- as.POSIXlt(x, origin="2000-01-01", tz="GMT")
y <- cumsum(runif(n, min=-1, max=1))
#df <- data.frame(datetime, y)
#df <- df[-sample(n, n*0.2),] #remove 20%

#You should set the values to NA instead of removing them
df <- data.frame(datetime, y)
df[sample(n, n*0.2), "y"] <- NA#remove 20%

library(xts)
myxts <- as.xts(df$y,order.by=df$datetime)
ep <- endpoints(myxts,'days')
daymeans <- period.apply(myxts, INDEX=ep, FUN=mean, na.rm=TRUE)

plot(myxts,cex=0.5, pch=20, type="p")
lines(daymeans)
points(daymeans, col="red")

在此处输入图片说明

However, this calculates POSIXct times, which you may want to convert to dates or round to noon for plotting. 但是，这将计算POSIXct时间，您可能需要将其转换为日期或四舍五入为绘图。

Answer 2

1) zoo Here is how it would be done with zoo. 1）动物园以下是动物园的处理方法。 z2 , the end result, is the series of means, one per day. 最终结果z2是一系列均值，每天一次。 We read columns 1 and 2 of df into a zoo object z and create a grid, g , which is a zero width zoo object of dates. 我们将df 1列和第2列读入动物园对象z并创建一个网格g ，它是一个零宽度的日期动物园对象。 Then compute the means, m , and merge the means and the grid. 然后计算均值m ，并将均值与网格合并。

library(zoo)
z <- read.zoo(df[1:2], FUN = identity)

m <- aggregate(z, as.Date, mean)
g <- zoo(, seq(start(m), end(m), by = "day"))
z2 <- merge(m, g, fill = NaN)

coredata(z2) is the data and time(z2) is the dates. coredata(z2)是数据，而time(z2)是日期。

2) zoo & magrittr Another way of expressing this, using zoo and the magrittr package, is with this pipeline: 2）zoo＆magrittr使用zoo和magrittr包来表达这一点的另一种方式是使用以下管道：

library(zoo)
library(magrittr)

df[1:2] %>%
    read.zoo(FUN = identity) %>%
    aggregate(as.Date, mean) %>%
    function(x) merge(x, zoo(, seq(start(x), end(x), by = "day")), fill = NaN)

REVISED. 修订。 New understanding is that we wish to create an object like df2 . 新的理解是我们希望创建一个类似df2的对象。 Added magrittr approach. 添加了magrittr方法。 Some minor impovements. 一些小的改进。

一种更有效的方法来计算每日平均时间序列，其中包括原始序列（在R中）找不到的日期？

问题描述

2 个解决方案

解决方案1
0 2014-02-27 13:52:54

解决方案2
0 2014-02-27 14:37:54

一种更有效的方法来计算每日平均时间序列，其中包括原始序列（在R中）找不到的日期？

问题描述

2 个解决方案

解决方案1 0 2014-02-27 13:52:54

解决方案2 0 2014-02-27 14:37:54

解决方案1
0 2014-02-27 13:52:54

解决方案2
0 2014-02-27 14:37:54