简体   繁体   English

时间序列预测-ARIMA / ARIMAX,每日数据为R

[英]Time series forecast - ARIMA/ARIMAX with daily data in R

enter code here I am working on a project to analyse and forecast time series for sales and revenue of a client. enter code here我正在研究一个项目,以分析和预测客户销售和收入的时间序列。 There are various models that i want to test for accuracy purposes - namely Holt Linear Method, Holt Winter Method, ARIMA, Seasonal ARIMA, and ARIMAX (as I also want to consider categorical variables in the data). 为了准确性,我想测试各种模型-即Holt线性方法,Holt Winter方法,ARIMA,季节性ARIMA和ARIMAX (因为我也想考虑数据中的分类变量)。 The data is in daily form, and hence i have chosen frequency to be 7. 该数据为每日数据,因此我选择的频率为7。

startW <- as.numeric(strftime(head(revenue$date, 1), format = "%W"))
startD <- as.numeric(strftime(head(revenue$date, 1) + 1, format =" %w")) 
revenue <- ts(revenue$amount, start = c(startW, startD), frequency = 7)

I then split it into train and test, keeping last month as hold-out set. 然后,我将其分为训练和测试,并保持上个月为保留时间。

I have used auto.arima() function for the ARIMA model and it is giving ARIMA(0,0,0)(2,1,0)[7]. 我已经为ARIMA模型使用了auto.arima()函数,它给出了ARIMA(0,0,0)(2,1,0)[7]。 What does that imply? 这意味着什么? The residuals plot looks like this 残差图看起来像这样 残留图1

Following this i added holidays as an exogenous variable 在此之后,我将假期作为外生变量添加

encoded_regressors <- sparse.model.matrix(amount~holiday, data = train_set)
encoded_regressors <- (encoded_regressors[,-1])
model2 <- auto.arima(revenue.train, xreg = encoded_regressors)

The model i get now is ARIMA(0,0,1)(2,1,0)[7] and here is the residual plot 我现在得到的模型是ARIMA(0,0,1)(2,1,0)[7],这是残差图 残留图2 .

For both the cases if i see the difference in predicted and observed value the percentage difference ranges from 3%-50% on average. 对于这两种情况,如果我看到预测值和观察值的差异,则百分比差异的平均范围为3%-50%。 How can i improve my model and understand the output of the ARIMA model? 如何改善模型并了解ARIMA模型的输出?

Thanks! 谢谢!

You seem to be using auto.arima() from the forecast package. 您似乎正在使用forecast包中的auto.arima() You can find a lot of good information about using this package and time series forecasting in R here . 您可以在此处找到有关在R中使用此软件包和时间序列预测的很多很好的信息。 For the output that you have given, the 3 values in the first parenthesis refer to the order of p, d, and q in the ARIMA model. 对于给定的输出,第一个括号中的3个值是指ARIMA模型中p,d和q的顺序。 p is the autoregressive term, d is the order of differencing, and q is the moving average term. p是自回归项,d是微分阶数,q是移动平均项。 The 3 values in the second parenthis refer to the seasonal components P, D, and Q, with each of these referring to the autoregressive, differencing, and moving average terms respectively. 第二个括号中的3个值表示季节性分量P,D和Q,每个值分别表示自回归项,差分项和移动平均值项。 The number 7 in the brackets refers to the frequency that you chose. 括号中的数字7表示您选择的频率。

In general, to find the best ARIMA model, you would look at the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC), and try to minimize these. 通常,要找到最佳的ARIMA模型,您可以查看Akaike信息标准(AIC)或贝叶斯信息标准(BIC),并尝试将其最小化。 Again, look at the link for more details. 同样,请查看链接以获取更多详细信息。

The ACF and PACF plots of the time series are as under 时间序列的ACF和PACF图如下 ACF图 PACF图

If my understanding is correct ACF suggests q = 7 and PACF suggests p = 7? 如果我的理解是正确的,ACF建议q = 7,而PACF建议p = 7?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM