简体   繁体   English

R timeSeries Aggregate()函数不包括小时的第一分钟

[英]R timeSeries Aggregate() Function Not Including First Minute Of Hour

I'm attempting to use the timeSeries package in R to aggregate data from a timeSeries object. 我试图在R中使用timeSeries包来聚合来自timeSeries对象的数据。 I wrote some basic sample code for reference: 我写了一些基本的示例代码供参考:

library(timeSeries)
library(timeDate)
BD <- as.timeDate(paste("2015-01-01", "00:00:00")) # Creates a timeDate.
ED <- as.timeDate(paste("2015-01-31", "23:59:00")) # Creates a timeDate.
DR <- seq(BD, ED, by = 60) # Creates a sequence by minutes in between the 2 dates.

data <- runif(length(DR), 0, 100) # Creating random sample data.

x <- timeSeries(data, DR) # Initializing a timeSeries object from data and DR.
colnames(x) <- "Data" # Renaming column.

by = timeSequence(BD, ED, by = "hour") # Setting the sequence to be aggregated on.
x.agg <- timeSeries::aggregate(x, by, sum) # Aggregating on that sequence.

After running the code my head looks like this: 运行代码后,我的头看起来像这样:

> head(x.agg)
GMT
                          Data
2015-01-01 00:00:00   29.71688
2015-01-01 01:00:00 3129.84860
2015-01-01 02:00:00 2398.92438
2015-01-01 03:00:00 3134.78608
2015-01-01 04:00:00 2743.79543
2015-01-01 05:00:00 3159.38404

Notice that the first data, "2015-01-01 00:00:00" is significantly less than the other hourly sums, in fact it is exactly the same as the data point in the original data sample: 请注意,第一个数据“ 2015-01-01 00:00:00”显着小于其他小时总和,实际上,它与原始数据样本中的数据点完全相同:

> head(x)
GMT
                        Data
2015-01-01 00:00:00 29.71688
2015-01-01 00:01:00 38.73175
2015-01-01 00:02:00  1.01945
2015-01-01 00:03:00 89.64938
2015-01-01 00:04:00 34.23608
2015-01-01 00:05:00 90.48571

Doing some investigating into where the sum is coming from, the aggregation for the "2015-01-01 01:00:00" hour is a summation of all the time in between (inclusive) "2015-01-01 00:01:00" and "2015-01-01 01:00:00" as shown code-wise here: 对总和来自何处进行一些调查,“ 2015-01-01 01:00:00”小时的汇总是“(包括)”“ 2015-01-01 00:01”之间所有时间的总和: 00”和“ 2015-01-01 01:00:00”,如此处的代码所示:

> sum(x[2:61,])
[1] 3129.849

> x.agg[2,]
GMT
                        Data
2015-01-01 01:00:00 3129.849

What I need is for the aggregation to sum across all the data points within the "00:00:00" hour, that is to say, the aggregation for "2015-01-01 00:00:00" should be equivalent with: 我需要的是汇总,以汇总“ 00:00:00”小时内的所有数据点,也就是说,“ 2015-01-01 00:00:00”的汇总应等于:

> sum(x[1:60,])
[1] 3065.829

including the first minute of that hour and not the first minute of the next hour like aggregation is doing. 包括该小时中的第一分钟,而不是下一个小时中的第一分钟(如聚合操作)。 It seems to be that the aggregation function is considering the first minute of the hour to not be part of that hour, which I find very strange. 似乎聚合函数正在考虑小时的第一分钟不是该小时的一部分,我觉得这很奇怪。 Any help would be greatly appreciated. 任何帮助将不胜感激。

It seems I found an answer to my own question and it involves modifying the source code for the timeSeries::aggregate() function. 看来我找到了自己的问题的答案,它涉及修改timeSeries::aggregate()函数的源代码。 To achieve what I wanted in my above question, go to the source code of the timeSeries package, which is most easily found by downloading the tar.gz file off CRAN here: 要实现上述问题,请转到timeSeries包的源代码,该代码最容易找到的方法是从CRAN此处下载tar.gz文件:

https://cran.r-project.org/web/packages/timeSeries/index.html https://cran.r-project.org/web/packages/timeSeries/index.html

Extract the file and make your way into the R folder inside the timeSeries folder. 解压缩文件,然后进入timeSeries文件夹内的R文件夹。 Find the "stats-aggregate.R" file and open it in R. In it, you'll see .aggregate.timeSeries function. 找到“ stats-aggregate.R”文件并在R中打开它。在其中,您将看到.aggregate.timeSeries函数。 Inside that function, what needs to be changed to get the result I wanted is that the +1 's need to be removed from line 80 and 81. After doing so, the aggregate function will aggregate in the way I wanted it to. 在该函数内部,需要更改以获得我想要的结果是需要从80行和81行中删除+1 。这样做之后,聚合函数将按照我想要的方式进行聚合。

Here is the modified function in text (I changed its name as well): 这是文本中的修改功能(我也更改了名称):

`modTSAgg <- 
 function(x, by, FUN, ...)
{
# A function implemented by Yohan Chalabi and Diethelm Wuertz

# Description:
#   Aggregates a 'timeSeries' object

# Details:
#   This function can be used to aggregate and coursen a
#   'timeSeries' object.

# Arguments:
#   x - a 'timeSeries' object to be aggregated
#   by - a calendarical block
#   FUN - function to be applied, by default 'colMeans'
#   ... - additional argument to be passed to the newly generated
#       'timeSeries' object

# Value:
#   Returns a S4 object of class 'timeSeries'.

# Examples:
# Quarterly Aggregation:
#   m = matrix(rep(1:12,2), ncol = 2)
#   ts = timeSeries(m, timeCalendar())
#   Y = getRmetricsOptions("currentYear"); Y
#   from = paste(Y, "04-01", sep = "-"); to = paste(Y+1, "01-01", sep = "-")
#   by = timeSequence(from, to, by = "quarter") - 24*3600; by
#   ts; aggregate(ts, by, sum)
# Weekly Aggregation:
#   dates = timeSequence(from = "2009-01-01", to = "2009-02-01", by = "day")
#   data = 10 * round(matrix(rnorm(2*length(dates)), ncol = 2), 1); data
#   ts = timeSeries(data = data, charvec = dates)
#   by = timeSequence(from = "2009-01-08",  to = "2009-02-01", by = "week")
#   by = by - 24*3600; aggregate(ts, by, sum)

# FUNCTION:

# Check Arguments:
if (!((inherits(by, "timeDate") && x@format != "counts") ||
      (is.numeric(by) && x@format == "counts")))
  stop("'by' should be of the same class as 'time(x)'", call.=FALSE)

# Extract Title and Documentation:
Title <- x@title
Documentation <- x@documentation

# Make sure that x is sorted:
if (is.unsorted(x))
  x <- sort(x)

# Sort and remove double entries in by:
by <- unique(sort(by))

INDEX <- findInterval(x@positions, as.numeric(by, "sec"))
INDEX <- INDEX
is.na(INDEX) <- !(INDEX <= length(by))

# YC : ncol important to avoid problems of dimension dropped by apply
data <- matrix(apply(getDataPart(x), 2, tapply, INDEX, FUN), ncol=ncol(x))
rownames(data) <- as.character(by[unique(na.omit(INDEX))])
colnames(data) <- colnames(x)
ans <- timeSeries(data, ...)

# Preserve Title and Documentation:
ans@title <- Title
ans@documentation <- Documentation

# Return Value:
ans

  }


setMethod("aggregate", "timeSeries", function(x, by, FUN, ...)
  modTSAgg(x, by, FUN, ...))


# until UseMethod dispatches S4 methods in 'base' functions
 aggregate.timeSeries <- function(x, ...) modTSAgg(x, ...)`

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM