R timeSeries Aggregate（）函數不包括小時的第一分鍾

Question

我試圖在R中使用timeSeries包來聚合來自timeSeries對象的數據。 我寫了一些基本的示例代碼供參考：

library(timeSeries)
library(timeDate)
BD <- as.timeDate(paste("2015-01-01", "00:00:00")) # Creates a timeDate.
ED <- as.timeDate(paste("2015-01-31", "23:59:00")) # Creates a timeDate.
DR <- seq(BD, ED, by = 60) # Creates a sequence by minutes in between the 2 dates.

data <- runif(length(DR), 0, 100) # Creating random sample data.

x <- timeSeries(data, DR) # Initializing a timeSeries object from data and DR.
colnames(x) <- "Data" # Renaming column.

by = timeSequence(BD, ED, by = "hour") # Setting the sequence to be aggregated on.
x.agg <- timeSeries::aggregate(x, by, sum) # Aggregating on that sequence.

運行代碼后，我的頭看起來像這樣：

> head(x.agg)
GMT
                          Data
2015-01-01 00:00:00   29.71688
2015-01-01 01:00:00 3129.84860
2015-01-01 02:00:00 2398.92438
2015-01-01 03:00:00 3134.78608
2015-01-01 04:00:00 2743.79543
2015-01-01 05:00:00 3159.38404

請注意，第一個數據“ 2015-01-01 00:00:00”顯着小於其他小時總和，實際上，它與原始數據樣本中的數據點完全相同：

> head(x)
GMT
                        Data
2015-01-01 00:00:00 29.71688
2015-01-01 00:01:00 38.73175
2015-01-01 00:02:00  1.01945
2015-01-01 00:03:00 89.64938
2015-01-01 00:04:00 34.23608
2015-01-01 00:05:00 90.48571

對總和來自何處進行一些調查，“ 2015-01-01 01:00:00”小時的匯總是“（包括）”“ 2015-01-01 00:01”之間所有時間的總和： 00”和“ 2015-01-01 01:00:00”，如此處的代碼所示：

> sum(x[2:61,])
[1] 3129.849

> x.agg[2,]
GMT
                        Data
2015-01-01 01:00:00 3129.849

我需要的是匯總，以匯總“ 00:00:00”小時內的所有數據點，也就是說，“ 2015-01-01 00:00:00”的匯總應等於：

> sum(x[1:60,])
[1] 3065.829

包括該小時中的第一分鍾，而不是下一個小時中的第一分鍾（如聚合操作）。 似乎聚合函數正在考慮小時的第一分鍾不是該小時的一部分，我覺得這很奇怪。 任何幫助將不勝感激。

Answer 1

看來我找到了自己的問題的答案，它涉及修改timeSeries::aggregate()函數的源代碼。 要實現上述問題，請轉到timeSeries包的源代碼，該代碼最容易找到的方法是從CRAN此處下載tar.gz文件：

https://cran.r-project.org/web/packages/timeSeries/index.html

解壓縮文件，然后進入timeSeries文件夾內的R文件夾。 找到“ stats-aggregate.R”文件並在R中打開它。在其中，您將看到.aggregate.timeSeries函數。 在該函數內部，需要更改以獲得我想要的結果是需要從80行和81行中刪除+1 。這樣做之后，聚合函數將按照我想要的方式進行聚合。

這是文本中的修改功能（我也更改了名稱）：

`modTSAgg <- 
 function(x, by, FUN, ...)
{
# A function implemented by Yohan Chalabi and Diethelm Wuertz

# Description:
#   Aggregates a 'timeSeries' object

# Details:
#   This function can be used to aggregate and coursen a
#   'timeSeries' object.

# Arguments:
#   x - a 'timeSeries' object to be aggregated
#   by - a calendarical block
#   FUN - function to be applied, by default 'colMeans'
#   ... - additional argument to be passed to the newly generated
#       'timeSeries' object

# Value:
#   Returns a S4 object of class 'timeSeries'.

# Examples:
# Quarterly Aggregation:
#   m = matrix(rep(1:12,2), ncol = 2)
#   ts = timeSeries(m, timeCalendar())
#   Y = getRmetricsOptions("currentYear"); Y
#   from = paste(Y, "04-01", sep = "-"); to = paste(Y+1, "01-01", sep = "-")
#   by = timeSequence(from, to, by = "quarter") - 24*3600; by
#   ts; aggregate(ts, by, sum)
# Weekly Aggregation:
#   dates = timeSequence(from = "2009-01-01", to = "2009-02-01", by = "day")
#   data = 10 * round(matrix(rnorm(2*length(dates)), ncol = 2), 1); data
#   ts = timeSeries(data = data, charvec = dates)
#   by = timeSequence(from = "2009-01-08",  to = "2009-02-01", by = "week")
#   by = by - 24*3600; aggregate(ts, by, sum)

# FUNCTION:

# Check Arguments:
if (!((inherits(by, "timeDate") && x@format != "counts") ||
      (is.numeric(by) && x@format == "counts")))
  stop("'by' should be of the same class as 'time(x)'", call.=FALSE)

# Extract Title and Documentation:
Title <- x@title
Documentation <- x@documentation

# Make sure that x is sorted:
if (is.unsorted(x))
  x <- sort(x)

# Sort and remove double entries in by:
by <- unique(sort(by))

INDEX <- findInterval(x@positions, as.numeric(by, "sec"))
INDEX <- INDEX
is.na(INDEX) <- !(INDEX <= length(by))

# YC : ncol important to avoid problems of dimension dropped by apply
data <- matrix(apply(getDataPart(x), 2, tapply, INDEX, FUN), ncol=ncol(x))
rownames(data) <- as.character(by[unique(na.omit(INDEX))])
colnames(data) <- colnames(x)
ans <- timeSeries(data, ...)

# Preserve Title and Documentation:
ans@title <- Title
ans@documentation <- Documentation

# Return Value:
ans

  }


setMethod("aggregate", "timeSeries", function(x, by, FUN, ...)
  modTSAgg(x, by, FUN, ...))


# until UseMethod dispatches S4 methods in 'base' functions
 aggregate.timeSeries <- function(x, ...) modTSAgg(x, ...)`

R timeSeries Aggregate（）函數不包括小時的第一分鍾

問題描述

1 個解決方案

解決方案1
0 已采納 2015-08-07 15:36:37

R timeSeries Aggregate（）函數不包括小時的第一分鍾

問題描述

1 個解決方案

解決方案1 0 已采納 2015-08-07 15:36:37

解決方案1
0 已采納 2015-08-07 15:36:37