简体   繁体   English

一种更有效的方法来计算每日平均时间序列,其中包括原始序列(在R中)找不到的日期?

[英]More efficient way to calculate a daily means time series which includes dates not found in the original series (in R)?

I'm wondering if there is a function/package (package: zoo ?) that will allow me to calculate daily (or other) means of a time series for a second series of values. 我想知道是否有一个函数/程序包(程序包: zoo吗?)可以让我为第二个系列的值计算每天(或其他)时间序列的平均值。 There are several questions on SO that deal with the creation of eg daily means, but none that allow grouping by an independent series. 关于SO的问题有几个,涉及日常工作的创建,但没有一个问题允许按独立系列进行分组。

As of now, I have been doing this in 2+ steps by first calculating means via the aggregate function, followed by a match to a full sequence of values. 到目前为止,我已经通过首先通过aggregate函数计算的方式,然后match完整的值序列进行match ,以2个多步骤进行了此操作。 The following example is a typical situation for me where there are some days that do not contain any values: 以下示例是我的典型情况,其中有些日子不包含任何值:

set.seed(1)
n <- 500
x <- cumsum(runif(n, min=99360*0.1, max=99360*2))
datetime <- as.POSIXlt(x, origin="2000-01-01", tz="GMT")
y <- cumsum(runif(n, min=-1, max=1))
df <- data.frame(datetime, y)
df <- df[-sample(n, n*0.2),] #remove 20%
plot(y ~ datetime, df, t="l")

#calculate daily means
df$date <- as.Date(df$datetime)
daymean <- aggregate(y ~ date, data=df, mean)

#create daily means ts including all possible dates
date.ran <- range(df$date)
df2 <- data.frame(date=seq(date.ran[1], date.ran[2], by="days"), y=NaN)
MATCH <- match(daymean$date, df2$date)
df2$y[MATCH] <- daymean$y

plot(y ~ datetime, df, cex=0.5, pch=20)
lines(as.POSIXlt(df2$date), df2$y, t="o", col=rgb(1,0,0,0.5))
legend("topright", legend=c("Orig.", "daily mean"), col=c(1,rgb(1,0,0,0.5)), lty=c(NA, 1), pch=c(20, 1))

在此处输入图片说明

set.seed(1)
n <- 500
x <- cumsum(runif(n, min=99360*0.1, max=99360*2))
datetime <- as.POSIXlt(x, origin="2000-01-01", tz="GMT")
y <- cumsum(runif(n, min=-1, max=1))
#df <- data.frame(datetime, y)
#df <- df[-sample(n, n*0.2),] #remove 20%

#You should set the values to NA instead of removing them
df <- data.frame(datetime, y)
df[sample(n, n*0.2), "y"] <- NA#remove 20%

library(xts)
myxts <- as.xts(df$y,order.by=df$datetime)
ep <- endpoints(myxts,'days')
daymeans <- period.apply(myxts, INDEX=ep, FUN=mean, na.rm=TRUE)

plot(myxts,cex=0.5, pch=20, type="p")
lines(daymeans)
points(daymeans, col="red")

在此处输入图片说明

However, this calculates POSIXct times, which you may want to convert to dates or round to noon for plotting. 但是,这将计算POSIXct时间,您可能需要将其转换为日期或四舍五入为绘图。

1) zoo Here is how it would be done with zoo. 1)动物园以下是动物园的处理方法。 z2 , the end result, is the series of means, one per day. 最终结果z2是一系列均值,每天一次。 We read columns 1 and 2 of df into a zoo object z and create a grid, g , which is a zero width zoo object of dates. 我们将df 1列和第2列读入动物园对象z并创建一个网格g ,它是一个零宽度的日期动物园对象。 Then compute the means, m , and merge the means and the grid. 然后计算均值m ,并将均值与网格合并。

library(zoo)
z <- read.zoo(df[1:2], FUN = identity)

m <- aggregate(z, as.Date, mean)
g <- zoo(, seq(start(m), end(m), by = "day"))
z2 <- merge(m, g, fill = NaN)

coredata(z2) is the data and time(z2) is the dates. coredata(z2)是数据,而time(z2)是日期。

2) zoo & magrittr Another way of expressing this, using zoo and the magrittr package, is with this pipeline: 2)zoo&magrittr使用zoo和magrittr包来表达这一点的另一种方式是使用以下管道:

library(zoo)
library(magrittr)

df[1:2] %>%
    read.zoo(FUN = identity) %>%
    aggregate(as.Date, mean) %>%
    function(x) merge(x, zoo(, seq(start(x), end(x), by = "day")), fill = NaN)

REVISED. 修订。 New understanding is that we wish to create an object like df2 . 新的理解是我们希望创建一个类似df2的对象。 Added magrittr approach. 添加了magrittr方法。 Some minor impovements. 一些小的改进。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM