简体   繁体   中英

First data point not considered in R timeSeries when averaging using `aggregate()`; how to correctly employ the function?

I want to construct daily averages for hourly electricity prices from the NordPool market. I am using the aggregate() method from the timeSeries package to construct the daily means from this hourly data, which I've converted to a timeSeries object. Here is a dput() of the first 72 hours:

    > dput(tstSeries)
    new("timeSeries"
    , .Data = structure(c(31.05, 30.47, 28.92, 27.88, 26.96, 27.84, 28.79, 
28.63, 28.44, 28.3, 30.65, 31.55, 32.16, 32.45, 32.63, 33.65, 
34.9, 36.22, 36.65, 36.37, 35.49, 34.41, 34.66, 32.55, 33.15, 
32.66, 31.83, 31.47, 32.56, 34.36, 36.28, 38.39, 39.09, 38.33, 
38.42, 38.25, 37.96, 37.89, 37.88, 38.78, 39.83, 39.91, 39.32, 
38.49, 37.46, 36.94, 36.37, 34.59, 33.11, 32.22, 31.46, 31.67, 
32.05, 33.67, 34.93, 35.82, 36.38, 36.52, 36.71, 36.6, 36.51, 
36.4, 36.42, 36.58, 36.94, 36.94, 36.81, 36.43, 35.91, 35.45, 
34.77, 32.09), .Dim = c(72L, 1L), .Dimnames = list(NULL, "TS.1"))
    , units = "TS.1"
    , positions = c(1356998400, 1357002000, 1357005600, 1357009200, 1357012800, 
1357016400, 1357020000, 1357023600, 1357027200, 1357030800, 1357034400, 
1357038000, 1357041600, 1357045200, 1357048800, 1357052400, 1357056000, 
1357059600, 1357063200, 1357066800, 1357070400, 1357074000, 1357077600, 
1357081200, 1357084800, 1357088400, 1357092000, 1357095600, 1357099200, 
1357102800, 1357106400, 1357110000, 1357113600, 1357117200, 1357120800, 
1357124400, 1357128000, 1357131600, 1357135200, 1357138800, 1357142400, 
1357146000, 1357149600, 1357153200, 1357156800, 1357160400, 1357164000, 
1357167600, 1357171200, 1357174800, 1357178400, 1357182000, 1357185600, 
1357189200, 1357192800, 1357196400, 1357200000, 1357203600, 1357207200, 
1357210800, 1357214400, 1357218000, 1357221600, 1357225200, 1357228800, 
1357232400, 1357236000, 1357239600, 1357243200, 1357246800, 1357250400, 
1357254000)
    , format = "%Y-%m-%d %H:%M:%S"
    , FinCenter = "GMT"
    , recordIDs = structure(list(), .Names = character(0), row.names = integer(0), class = "data.frame")
    , title = "Time Series Object"
    , documentation = "Wed May 20 11:02:09 2015"
)

To do the averaging, I do the following:

## daily averaging
bydaily = timeSequence(from = start(tstSeries), to = end(tstSeries), by = "day")
tstSeries.daily = aggregate(tstSeries, by = bydaily, FUN = mean)  

The output I get is:

tstSeries.daily

>GMT
TS.1
2013-01-01 31.05000
2013-01-02 31.82167
2013-01-03 36.67375  

Here, the first daily averaged value is the original data point! I performed the same calculation in Excel and confirmed that the first data point is not being considered in the averaging operation, instead the average for 2013-01-02 is being calculated as the average of 2013-01-01 01:00 to 2013-01-02 00:00.

I've seen several examples demonstrating the use of aggregate() but have not come across any which raise this issue. Has anyone seen this happen and is there a work-around?

Here is a solution that returns the desired output. It depends on apply.rolling function from PerformanceAnalytics package.

tstSeries.daily<-apply.rolling(tstSeries,width=24,by=24, FUN="mean") # get the mean of each of the 24 hours intervals.
tstSeries.daily<-tstSeries.daily[complete.cases(tstSeries.daily),] # remove rows with NAs.
rownames(tstSeries.daily)<-as.Date(rownames(tstSeries.daily)) # remove the time part of the index.
print(tstSeries.daily)
GMT 
              calcs
2013-01-01 31.73417
2013-01-02 36.67542
2013-01-03 35.09958

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM