如何使用多个开始和结束日期的输入来计算时间序列中指定日期/时间范围内的摘要统计信息？

Question

I have a (dummy) data frame with time series data: 我有一个带有时间序列数据的（虚拟）数据框：

datetime <- as.POSIXct(seq(ISOdate(2012,12,22), ISOdate(2012,12,23), by="hour"), tz='EST')
data <- rnorm(25, 10, 5)
df <- data.frame(datetime, data)

I also have a separate data frame with start and end times as the two columns: 我还有一个单独的数据帧，其中开始时间和结束时间为两列：

start <- as.POSIXct(c('2012/12/22 19:53', '2012/12/22 23:05'), tz='gmt')
end <- as.POSIXct(c('2012/12/22 21:06', '2012/12/22 23:58'), tz='gmt')
index <- data.frame(start, end)

What I'd like to do is "feed" the main data frame 'df' the 'index' data frame, and, for each start and end date/time combination, find the average value of "data" within that date/time range. 我想做的是“馈送”主数据框“ df”和“索引”数据框，并针对每个开始和结束日期/时间组合，找到该日期/时间内“数据”的平均值范围。 This would be equivalent to doing a subset of 'df' manually for each start/end time, but in a combined fashion. 这等效于在每个开始/结束时间手动执行“ df”的子集，但以组合方式进行。 (My real data set has years of data, and a hundred date/time ranges I want to feed it FYI). （我的真实数据集包含多年的数据，我想供其仅供参考的一百个日期/时间范围）。

End goal is to have three columns, start time, end time, and the average numeric value of 'data' within those times. 最终目标是拥有三列，即开始时间，结束时间和这些时间内“数据”的平均数值。

Answer 1

In general you don't want to grow a data frame one row at a time by calling rbind because it is very inefficient (see the second circle of the R inferno for details). 通常，您不希望通过调用rbind来一次增加一行数据帧，因为它效率很低（有关详细信息，请参见R inferno的第二个循环）。 In your case, you can use sapply to replicate this logic: 在您的情况下，可以使用sapply复制此逻辑：

index$mean <- sapply(1:nrow(index), function(i) mean(df[df$datetime >= index$start[i] &
                                                        df$datetime <= index$end[i],2]))
index
#                 start                 end     mean
# 1 2012-12-22 19:53:00 2012-12-22 21:06:00 9.563336
# 2 2012-12-22 23:05:00 2012-12-22 23:58:00      NaN

Answer 2

I figured out how to do it with a for loop. 我想出了如何用for循环做到这一点。 If anyone has a more efficient solution, that would be great. 如果有人拥有更有效的解决方案，那就太好了。 The for loop solution: for循环解决方案：

d <- data.frame()
for i in (1:nrow(index)) {
    d <- rbind(d, mean(subset(df, datetime >= index[i,1] &
                                  datetime <= index[i,2])[,2]))}

如何使用多个开始和结束日期的输入来计算时间序列中指定日期/时间范围内的摘要统计信息？

问题描述

2 个解决方案

解决方案1
1 2015-06-12 19:19:51

解决方案2
0 2015-04-22 13:14:48

如何使用多个开始和结束日期的输入来计算时间序列中指定日期/时间范围内的摘要统计信息？

问题描述

2 个解决方案

解决方案1 1 2015-06-12 19:19:51

解决方案2 0 2015-04-22 13:14:48

解决方案1
1 2015-06-12 19:19:51

解决方案2
0 2015-04-22 13:14:48