简体   繁体   English

此数据是否适合 ARIMA 建模?

[英]Is this data fit for ARIMA modelling?

I used below data to develop Auto ARIMA model.我使用以下数据来开发 Auto ARIMA 模型。 But after looking at the results, I dont understand if this data is fit to run ARIMA model.但是看了结果,我不明白这个数据是否适合运行ARIMA模型。 Differencing the variable Count at 3rd lag gave the significant p-value and auto.arima suggested the order (3,0,0).在第 3 个滞后处对变量 Count 进行差分给出了显着的 p 值,而 auto.arima 建议了顺序 (3,0,0)。 But the predicted values turned out to be not something expected, which are mostly negative values.但结果证明预测值不是预期的,它们大多是负值 The actual data didn't consist of any negative values.实际数据不包含任何负值 I dont understand what is the issue.我不明白这是什么问题。 The model looks statistically correct but the predicted values are not looking good.该模型在统计上看起来是正确的,但预测值看起来并不好。 Any help much appreciated.非常感谢任何帮助。

Data:数据:

dput(Enrollment_Data)
structure(list(COUNT = c(17L, 1L, 5L, 8L, 45L, 21L, 18L, 43L, 
82L, 116L, 192L, 289L, 242L, 254L, 335L, 138L, 71L, 98L, 91L, 
138L, 175L, 232L, 155L, 376L, 197L, 271L, 421L), Enrolment_date = structure(c(25L, 
20L, 5L, 10L, 8L, 16L, 1L, 18L, 14L, 12L, 3L, 26L, 23L, 21L, 
6L, 11L, 9L, 17L, 2L, 19L, 15L, 13L, 4L, 27L, 24L, 22L, 7L), .Label = c("APR2018", 
"APR2019", "AUG2018", "AUG2019", "DEC2017", "DEC2018", "DEC2019", 
"FEB2018", "FEB2019", "JAN2018", "JAN2019", "JUL2018", "JUL2019", 
"JUN2018", "JUN2019", "MAR2018", "MAR2019", "MAY2018", "MAY2019", 
"NOV2017", "NOV2018", "NOV2019", "OCT2018", "OCT2019", "SEP2017", 
"SEP2018", "SEP2019"), class = "factor")), class = "data.frame", row.names = c(NA, 
-27L))

Code:代码:

Enrollment_Data <- read.csv('EnrollmentRateT0.csv')

print(Enrollment_Data)
dput(Enrollment_Data)
#load packages
library("tseries")
library("ggplot2")
library("forecast")
library(FitAR)
library("fUnitRoots")
library(lmtest)
library(fpp2)


attach(Enrollment_Data)
#Step-1 : Model Identification
#Stationarity Check - Dicky-Fuller test

#P-value > 0.5 Heance the data is non - stationary

d.COUNT <- diff(COUNT, differences = 3)
summary(COUNT)
summary(d.COUNT)

plot(d.COUNT)

adf.test(d.COUNT, alternative="stationary")

acf(d.COUNT)
pacf(d.COUNT)

#Step 2: Model Estimation


#Step 4: Diagnosis
auto.arima(d.COUNT)
auto.arima(d.COUNT, stepwise = FALSE, approximation = FALSE)

arima.final <-auto.arima(d.COUNT, stepwise = FALSE, approximation = FALSE, D=1)

tsdiag(arima.final)

arima.final

'Choose the one that has least AIC and significant co-efficients'

#arima.final <-arima(COUNT, c(3,3,1))

forecast1 <- forecast(arima.final,h = 12)

forecast1

plot.forecast(futurVal)
plot(forecast1)
class(forecast1)
print(forecast1)
summary(forecast1)
accuracy(forecast1)
plot(d.COUNT)

p <- predict(arima.final,n.ahead = 12); 
f <- forecast(arima.final, h = 12); 
all.equal(f$mean, p$pred)

accuracy(f)
p
f

results:结果:

Point    Forecast      Lo 80    Hi 80     Lo 95     Hi 95
25  -234.78798559 -376.20497 -93.3710 -451.0666 -18.50937
26   248.28301149  -21.68036 518.2464 -164.5903 661.15636
27    38.07516814 -281.53132 357.6817 -450.7208 526.87112
28  -278.77782716 -600.00425  42.4486 -770.0513 212.49560
29   251.40378400  -74.76879 577.5764 -247.4341 750.24168
30   -31.49668698 -359.73170 296.7383 -533.4888 470.49545
31  -144.02466378 -474.75484 186.7055 -649.8328 361.78350
32   130.22859430 -211.26598 471.7232 -392.0423 652.49947
33    13.52166802 -332.92417 359.9675 -516.3215 543.36485
34  -123.35180366 -469.81119 223.1076 -653.2157 406.51210
35   103.92492852 -244.63788 452.4877 -429.1559 637.00574
36    -0.06911659 -349.40010 349.2619 -534.3247 534.18651

You're running auto.arima() on d.COUNT which are the 3rd lag differences of original Enrollment_Data$COUNT .您正在运行auto.arima()d.COUNT这是原来的3滞后差异Enrollment_Data$COUNT That d.COUNT do contain lots of negative values. d.COUNT确实包含许多负值。 I believe you want to run auto.arima on Enrollment_Data$COUNT instead.我相信您想在Enrollment_Data$COUNT上运行auto.arima

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM