简体   繁体   English

汇总R中的时间序列

[英]aggregating time series in R

I have the following OHLC data (by 3-minute intervals) 我有以下OHLC数据(每隔3分钟)

library(tseries)
library(xts)
library(quantmod)
> str(tickmin)
An ‘xts’ object from 2010-06-30 15:47:00 to 2010-09-08 15:14:00 containing:
  Data: num [1:8776, 1:5] 9215 9220 9205 9195 9195 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:5] "zv.Open" "zv.High" "zv.Low" "zv.Close" ...
  Indexed by objects of class: [POSIXct,POSIXt] TZ: 
  xts Attributes:  
 NULL


>tickmin
2010-09-08 15:02:00        20
2010-09-08 15:04:00        77
2010-09-08 15:08:00        86
2010-09-08 15:11:00         7
2010-09-08 15:14:00        43
> start(tickmin)
[1] "2010-06-30 15:47:00 EDT"
> end(tickmin)
[1] "2010-09-08 15:14:00 EDT"

I am trying to aggregate it using the following: 我试图使用以下内容聚合它:

> by <-timeSequence(from = start(tickmin), to = end(tickmin), format="%Y-%m-%d %H%M", by = "day")
>by
[61] [2010-08-29 19:47:00] [2010-08-30 19:47:00] [2010-08-31 19:47:00]
[64] [2010-09-01 19:47:00] [2010-09-02 19:47:00] [2010-09-03 19:47:00]
[67] [2010-09-04 19:47:00] [2010-09-05 19:47:00] [2010-09-06 19:47:00]
[70] [2010-09-07 19:47:00]

> aggregate(Vo(tickmin),by,sum)
Error: length(time(x)) == length(by[[1]]) is not TRUE

..would appreciate any suggestions on how I can fix the error. ..会不会对如何解决错误提出任何建议。

I'll explain your error and tell you how to fix it, but there's a better way to do what you're doing. 我会解释你的错误并告诉你如何解决它,但是有更好的方法来做你正在做的事情。 So make sure you read my entire answer! 所以一定要读完我的答案!

From the error message, the length of your by is not the same length as Vo(tickmin) . 从错误消息中,您by长度与Vo(tickmin)长度Vo(tickmin) You have to generate your by to have one value per corresponding value in tickmin , with the day. 你必须生成你的by以获得每个相应值的一个值,以及每天的tickmin

As an example here I generate an xts object: 作为一个例子,我生成一个xts对象:

# generate a set of times from 2010-06-30 onwards at 20 minute intervals
tms <- as.POSIXct(seq(0,3600*24*30,by=60*20),origin="2010-06-30")
n   <- length(tms)
# generate volumes for those intervals, random 0 -- 100, turn into xts object
xts.ts <- xts(sample.int(100,n,replace=T),tms)
colnames(xts.ts)<-'Volume'

which yields: 产量:

> head(xts.ts)
                    Volume
2010-06-30 00:00:00     97
2010-06-30 00:20:00     78
2010-06-30 00:40:00     38
2010-06-30 01:00:00     86
2010-06-30 01:20:00     79
2010-06-30 01:40:00     55

To access the dates of xts.ts you use index(xts.ts) which gives a whole bunch of strings of the date, eg "2010-07-30 00:00:00 EST" . 要访问xts.ts的日期,请使用index(xts.ts) ,它提供了一大堆日期字符串,例如"2010-07-30 00:00:00 EST"

To round these to the nearest day you can use as.Date : 要将这些舍入到最近的一天,您可以使用as.Date

> as.Date(index(xts.ts))
   [1] "2010-06-29" "2010-06-29" "2010-06-29" "2010-06-29" "2010-06-29"
    ....

Solution to your problem 解决您的问题

Then to use aggregate you do: 然后使用aggregate你做:

> aggregate(Vo(xts.ts),as.Date(index(xts.ts)),sum)

2010-06-29 1858
2010-06-30 3733
2010-07-01 3906
2010-07-02 3359
2010-07-03 3838
...

Better solution to your problem 更好地解决您的问题

The xts package has functions apply.daily , apply.monthly , etc (use ls('package:xts') to see what functions it has -- there may be ones you're interested in). xts包具有apply.dailyapply.monthly等函数(使用ls('package:xts')来查看它有哪些函数 - 可能有你感兴趣的函数)。

apply.daily(x,FUN,...) does exactly what you want. apply.daily(x,FUN,...) 完全符合你的要求。 See ?apply.daily . ?apply.daily To use it you can do: 要使用它,您可以:

> apply.daily(xts.ts,sum)

                    Volume
2010-06-30 23:40:00   4005
2010-07-01 23:40:00   4093
2010-07-02 23:40:00   3419
2010-07-03 23:40:00   3737
...

Or if your xts object has other columns like Open , Close etc, you can do apply.daily(xts.ts, function(x) sum(Vo(x))) . 或者如果您的xts对象具有其他列,如OpenClose等,您可以执行apply.daily(xts.ts, function(x) sum(Vo(x)))

Note that the answers are slightly different using apply.daily to the aggregate ... as.Date method. 请注意,使用apply.dailyaggregate ... as.Date方法的答案略有不同。 That's because apply.daily goes daily from start(xts.ts) to end(xts.ts) (more or less) whereas aggregate just went by day from midnight to midnight. 这是因为apply.daily每天从start(xts.ts)end(xts.ts) (或多或少),而aggregate只是从午夜到午夜。

Looking at your question, apply.daily seems to match most closely what you want to do (and is provided with xts anyway, so why not use it?) 看看你的问题, apply.daily似乎与你想要做的最匹配(并且无论如何都提供了xts ,那么为什么不使用它呢?)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM