[英]Convert data frame to time series suitable for auto.arima
I have the following data frame: 我有以下数据框:
read.csv(file="CNY % returns.csv",head=TRUE,sep=",")
DATE LOG...RETURNS
1 03/09/13 -6.9106715
2 04/09/13 -6.9106715
3 05/09/13 -4.5839582
4 06/09/13 1.7554592
5 07/09/13 -0.8808549
6 08/09/13 4.1842420
DATE: obviosuly date; format dd/mm/yyyy.
LOG RETURNS: compounded returns from a bitcoin CNY exchange.
I wish to use the auto.arima
function as a start point to select a suitable model. 我希望将auto.arima
函数用作选择合适模型的起点。
I have already tried: 我已经尝试过:
cnyX <- read.zoo(text=" DATE LOG...RETURNS
1 03/09/13 -6.9106715
2 04/09/13 -6.9106715
3 05/09/13 -4.5839582
4 06/09/13 1.7554592
5 07/09/13 -0.8808549
6 08/09/13 4.1842420")
index(cnyX) <- as.Date(as.character(index(cnyX)),format="%D%m%y")
this produces: 这将产生:
<NA> <NA> <NA> <NA> <NA> <NA>
0.2144527 -9.2553228 -0.8519708 -4.2074340 14.0817672 1.2212485 ....
I realise the as.character
separator is incorrect but am unsure how this should be fixed or corrected. 我意识到as.character
分隔符不正确,但不确定应如何解决或更正。 I have read about creating XTS and TS objects but have not been able to make these work either. 我已经阅读了有关创建XTS和TS对象的信息,但也无法使它们起作用。 I have also referred to: Convert data frame with date column to timeseries but found this unsuitable. 我还提到过: 将带有日期列的数据框转换为时间序列,但是发现这不合适。
How should I convert my data frame to a suitable format for auto.arima
? 我应该如何将数据帧转换为适用于auto.arima
格式? I may have duplicate values present. 我可能存在重复的值。
The problem stems from the incorrect format
argument you passed to as.Date
. 问题源于传递给as.Date
format
参数不正确。 In fact, if you ever try to convert something from character
to Date
and you get a vector of all NA
s, you almost certainly did not specify the format
correctly. 实际上,如果您尝试将character
从character
转换为Date
并获得所有NA
的向量,则几乎可以肯定没有正确指定format
。
Here's a comparable data set: 这是一个可比较的数据集:
Df <- data.frame(
Date = format(Sys.Date() - (729:0), "%d/%m/%y"),
LogReturns = log(rgamma(730, .25)),
stringsAsFactors = FALSE
)
Using the correct format
, 使用正确的format
,
ln_ret <- zoo::zoo(Df[,2], as.Date(Df[,1], format = "%d/%m/%y"))
ln_ret[1:4]
#2014-01-05 2014-01-06 2014-01-07 2014-01-08
# -2.268443 -3.562711 -4.546391 -0.707788
This will work with auto.arima
: 这将与auto.arima
一起auto.arima
:
forecast::auto.arima(ln_ret)
#Series: ln_ret
#ARIMA(0,0,0) with non-zero mean
#
#Coefficients:
# intercept
# -4.0742
#s.e. 0.1454
#
#sigma^2 estimated as 15.43: log likelihood=-2034.46
#AIC=4072.93 AICc=4072.94 BIC=4082.11
You don't need to worry about the correct date format if you just want to fit an ARIMA model to your log-return data. 如果您只想将ARIMA模型适合对数返回数据,则无需担心正确的日期格式。 That is, you know when the ts begins and ends, and it's trivial to keep track of the dates for any forecasts, if those are ultimately desired. 也就是说,您知道ts的开始和结束时间,并且如果最终需要这些日期,则跟踪任何预测的日期很简单。
This would work, too. 这也将起作用。
tt <- read.csv(file="CNY % returns.csv",head=TRUE,sep=",")
# assuming default options for orders p, d, q, etc
forecast::auto.arima(x=tt[,2])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.