[英]Is this data fit for ARIMA modelling?
I used below data to develop Auto ARIMA model.我使用以下数据来开发 Auto ARIMA 模型。 But after looking at the results, I dont understand if this data is fit to run ARIMA model.
但是看了结果,我不明白这个数据是否适合运行ARIMA模型。 Differencing the variable Count at 3rd lag gave the significant p-value and auto.arima suggested the order (3,0,0).
在第 3 个滞后处对变量 Count 进行差分给出了显着的 p 值,而 auto.arima 建议了顺序 (3,0,0)。 But the predicted values turned out to be not something expected, which are mostly negative values.
但结果证明预测值不是预期的,它们大多是负值。 The actual data didn't consist of any negative values.
实际数据不包含任何负值。 I dont understand what is the issue.
我不明白这是什么问题。 The model looks statistically correct but the predicted values are not looking good.
该模型在统计上看起来是正确的,但预测值看起来并不好。 Any help much appreciated.
非常感谢任何帮助。
Data:数据:
dput(Enrollment_Data)
structure(list(COUNT = c(17L, 1L, 5L, 8L, 45L, 21L, 18L, 43L,
82L, 116L, 192L, 289L, 242L, 254L, 335L, 138L, 71L, 98L, 91L,
138L, 175L, 232L, 155L, 376L, 197L, 271L, 421L), Enrolment_date = structure(c(25L,
20L, 5L, 10L, 8L, 16L, 1L, 18L, 14L, 12L, 3L, 26L, 23L, 21L,
6L, 11L, 9L, 17L, 2L, 19L, 15L, 13L, 4L, 27L, 24L, 22L, 7L), .Label = c("APR2018",
"APR2019", "AUG2018", "AUG2019", "DEC2017", "DEC2018", "DEC2019",
"FEB2018", "FEB2019", "JAN2018", "JAN2019", "JUL2018", "JUL2019",
"JUN2018", "JUN2019", "MAR2018", "MAR2019", "MAY2018", "MAY2019",
"NOV2017", "NOV2018", "NOV2019", "OCT2018", "OCT2019", "SEP2017",
"SEP2018", "SEP2019"), class = "factor")), class = "data.frame", row.names = c(NA,
-27L))
Code:代码:
Enrollment_Data <- read.csv('EnrollmentRateT0.csv')
print(Enrollment_Data)
dput(Enrollment_Data)
#load packages
library("tseries")
library("ggplot2")
library("forecast")
library(FitAR)
library("fUnitRoots")
library(lmtest)
library(fpp2)
attach(Enrollment_Data)
#Step-1 : Model Identification
#Stationarity Check - Dicky-Fuller test
#P-value > 0.5 Heance the data is non - stationary
d.COUNT <- diff(COUNT, differences = 3)
summary(COUNT)
summary(d.COUNT)
plot(d.COUNT)
adf.test(d.COUNT, alternative="stationary")
acf(d.COUNT)
pacf(d.COUNT)
#Step 2: Model Estimation
#Step 4: Diagnosis
auto.arima(d.COUNT)
auto.arima(d.COUNT, stepwise = FALSE, approximation = FALSE)
arima.final <-auto.arima(d.COUNT, stepwise = FALSE, approximation = FALSE, D=1)
tsdiag(arima.final)
arima.final
'Choose the one that has least AIC and significant co-efficients'
#arima.final <-arima(COUNT, c(3,3,1))
forecast1 <- forecast(arima.final,h = 12)
forecast1
plot.forecast(futurVal)
plot(forecast1)
class(forecast1)
print(forecast1)
summary(forecast1)
accuracy(forecast1)
plot(d.COUNT)
p <- predict(arima.final,n.ahead = 12);
f <- forecast(arima.final, h = 12);
all.equal(f$mean, p$pred)
accuracy(f)
p
f
results:结果:
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
25 -234.78798559 -376.20497 -93.3710 -451.0666 -18.50937
26 248.28301149 -21.68036 518.2464 -164.5903 661.15636
27 38.07516814 -281.53132 357.6817 -450.7208 526.87112
28 -278.77782716 -600.00425 42.4486 -770.0513 212.49560
29 251.40378400 -74.76879 577.5764 -247.4341 750.24168
30 -31.49668698 -359.73170 296.7383 -533.4888 470.49545
31 -144.02466378 -474.75484 186.7055 -649.8328 361.78350
32 130.22859430 -211.26598 471.7232 -392.0423 652.49947
33 13.52166802 -332.92417 359.9675 -516.3215 543.36485
34 -123.35180366 -469.81119 223.1076 -653.2157 406.51210
35 103.92492852 -244.63788 452.4877 -429.1559 637.00574
36 -0.06911659 -349.40010 349.2619 -534.3247 534.18651
You're running auto.arima()
on d.COUNT
which are the 3rd lag differences of original Enrollment_Data$COUNT
.您正在运行
auto.arima()
上d.COUNT
这是原来的3滞后差异Enrollment_Data$COUNT
。 That d.COUNT
do contain lots of negative values. d.COUNT
确实包含许多负值。 I believe you want to run auto.arima
on Enrollment_Data$COUNT
instead.我相信您想在
Enrollment_Data$COUNT
上运行auto.arima
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.