I have a Daily frequency time series dataset.
Dataset:
Births
Date
1959-01-01 35
1959-01-02 32
1959-01-03 30
1959-01-04 31
1959-01-05 44
1959-01-06 29
1959-01-07 45
1959-01-08 43
1959-01-09 38
1959-01-10 27
1959-01-11 38
1959-01-12 33
1959-01-13 55
1959-01-14 47
1959-01-15 45
1959-01-16 37
1959-01-17 50
1959-01-18 43
1959-01-19 41
1959-01-20 52
1959-01-21 34
1959-01-22 53
1959-01-23 39
1959-01-24 32
1959-01-25 37
1959-01-26 43
1959-01-27 39
1959-01-28 35
1959-01-29 44
1959-01-30 38
1959-01-31 24
1959-02-01 23
1959-02-02 31
1959-02-03 44
1959-02-04 38
1959-02-05 50
1959-02-06 38
1959-02-07 51
1959-02-08 31
1959-02-09 31
1959-02-10 51
1959-02-11 36
1959-02-12 45
1959-02-13 51
1959-02-14 34
1959-02-15 52
1959-02-16 47
1959-02-17 45
1959-02-18 46
1959-02-19 39
1959-02-20 48
1959-02-21 37
1959-02-22 35
1959-02-23 52
1959-02-24 42
1959-02-25 45
1959-02-26 39
1959-02-27 37
1959-02-28 30
1959-03-01 35
For Stationarity, I checked using Augmented Dickey-fuller test, it turned out to be Stationary.
I wanted to apply ARMA model on it, given that my seasonal component was absent and data is stationary. To get best value of (p,q) I used:
from pmdarima import auto_arima
auto_arima(df1['Births'],start_p=1,max_p=6, start_q=1, max_q=6, seasonal=False, trace = True).summary()
It returned me:
Fit ARIMA: (0, 0, 0)x(0, 0, 0, 0) (constant=True); AIC=419.527, BIC=423.716, Time=0.032 seconds
Fit ARIMA: (0, 0, 1)x(0, 0, 0, 0) (constant=True); AIC=421.238, BIC=427.521, Time=0.082 seconds
Fit ARIMA: (0, 0, 2)x(0, 0, 0, 0) (constant=True); AIC=421.309, BIC=429.687, Time=0.095 seconds
Fit ARIMA: (0, 0, 3)x(0, 0, 0, 0) (constant=True); AIC=422.696, BIC=433.168, Time=0.135 seconds
Fit ARIMA: (0, 0, 4)x(0, 0, 0, 0) (constant=True); AIC=424.376, BIC=436.942, Time=0.185 seconds
Fit ARIMA: (0, 0, 5)x(0, 0, 0, 0) (constant=True); AIC=426.365, BIC=441.026, Time=0.258 seconds
Fit ARIMA: (1, 0, 0)x(0, 0, 0, 0) (constant=True); AIC=421.148, BIC=427.431, Time=0.016 seconds
Fit ARIMA: (1, 0, 1)x(0, 0, 0, 0) (constant=True); AIC=422.261, BIC=430.639, Time=0.244 seconds
Fit ARIMA: (1, 0, 2)x(0, 0, 0, 0) (constant=True); AIC=423.047, BIC=433.519, Time=0.282 seconds
Fit ARIMA: (1, 0, 3)x(0, 0, 0, 0) (constant=True); AIC=424.396, BIC=436.962, Time=0.427 seconds
Fit ARIMA: (1, 0, 4)x(0, 0, 0, 0) (constant=True); AIC=426.380, BIC=441.041, Time=0.228 seconds
Fit ARIMA: (2, 0, 0)x(0, 0, 0, 0) (constant=True); AIC=421.586, BIC=429.963, Time=0.144 seconds
Fit ARIMA: (2, 0, 1)x(0, 0, 0, 0) (constant=True); AIC=423.493, BIC=433.965, Time=0.226 seconds
Fit ARIMA: (2, 0, 2)x(0, 0, 0, 0) (constant=True); AIC=422.342, BIC=434.908, Time=0.469 seconds
Fit ARIMA: (2, 0, 3)x(0, 0, 0, 0) (constant=True); AIC=422.484, BIC=437.144, Time=0.517 seconds
Fit ARIMA: (3, 0, 0)x(0, 0, 0, 0) (constant=True); AIC=423.349, BIC=433.821, Time=0.232 seconds
Fit ARIMA: (3, 0, 1)x(0, 0, 0, 0) (constant=True); AIC=424.792, BIC=437.358, Time=0.438 seconds
Fit ARIMA: (3, 0, 2)x(0, 0, 0, 0) (constant=True); AIC=422.814, BIC=437.475, Time=0.518 seconds
Fit ARIMA: (4, 0, 0)x(0, 0, 0, 0) (constant=True); AIC=424.320, BIC=436.886, Time=0.356 seconds
Fit ARIMA: (4, 0, 1)x(0, 0, 0, 0) (constant=True); AIC=426.278, BIC=440.938, Time=0.347 seconds
Fit ARIMA: (5, 0, 0)x(0, 0, 0, 0) (constant=True); AIC=426.249, BIC=440.909, Time=0.574 seconds
Total fit time: 5.839 seconds
SARIMAX Results
Dep. Variable: y No. Observations: 60
Model: SARIMAX Log Likelihood -207.764
Date: Wed, 19 Feb 2020 AIC 419.527
Time: 12:06:46 BIC 423.716
Sample: 0 HQIC 421.166
- 60
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
intercept 39.9333 0.997 40.068 0.000 37.980 41.887
sigma2 59.5956 13.897 4.288 0.000 32.358 86.833
Ljung-Box(Q):51.46 Jarque-Bera (JB): 1.50
Prob(Q): 0.11 Prob(JB): 0.47
Heteroskedasticity (H): 0.80 Skew: -0.01
Prob(H) (two-sided): 0.63 Kurtosis: 2.23
The result having lowest AIC score is SARIMAX(0,0,0).
d=0, is understandable that differencing is not required. But, with p,q also 0, what does that technically signify? Is it okay to have p and q as 0? Please let me know if anything is unclear.
Your time series data resemble (mean-shifted) white noise; data do not support any evidence of an underlying auto-regressive (AR) or moving average (MA) process. As such an ARIMA(0,0,0) model (with a non-zero mean) is consistent with your data.
Prior to fitting (S)ARIMA models, it is always instructive to take a look at the raw data.
import matplotlib.pyplot as plt
import matplotlib.dates as dates
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
plt.plot(data.Date, data.Births)
ax.xaxis.set_major_locator(dates.DayLocator(interval = 10))
ax.xaxis.set_major_formatter(dates.DateFormatter('%d/%m'))
ax.set_xlabel('DD/MM in 1959')
ax.set_ylabel("Births")
plt.show()
Already here we see that data more-or-less resemble white noise.
We can explore this further, by plotting the ACF and PACF
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf
plt.figure()
plt.subplot(211)
plot_acf(data["Births"], ax = plt.gca())
plt.subplot(212)
plot_pacf(data["Births"], ax = plt.gca())
plt.show()
There is no significant (partial) auto-correlation at lags < 10; the few significant spikes at higher lags in the PACF may be as expected (since we're plotting 95% CIs), or could be due to some other time series "abnormality". IMO, given the sparsity of the data, there is just not enough information to say much more.
Let's re-run auto_arima
:
from pmdarima import auto_arima
auto_arima(
data["Births"],
start_p = 1, max_p = 6,
start_q = 1, max_q = 6,
seasonal = False, trace = True).summary()
#Performing stepwise search to minimize aic
#/Users/maurits/miniconda3/lib/python3.5/site-packages/statsmodels/base/model.py:568: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
# "Check mle_retvals", ConvergenceWarning)
#Fit ARIMA: (1, 0, 1)x(0, 0, 0, 0) (constant=True); AIC=422.260, BIC=430.637, Time=0.255 seconds
#Fit ARIMA: (0, 0, 0)x(0, 0, 0, 0) (constant=True); AIC=419.527, BIC=423.716, Time=0.010 seconds
#Fit ARIMA: (1, 0, 0)x(0, 0, 0, 0) (constant=True); AIC=421.148, BIC=427.431, Time=0.040 seconds
#Fit ARIMA: (0, 0, 1)x(0, 0, 0, 0) (constant=True); AIC=421.238, BIC=427.521, Time=0.058 seconds
#Fit ARIMA: (0, 0, 0)x(0, 0, 0, 0) (constant=False); AIC=616.939, BIC=619.034, Time=0.010 seconds
#Total fit time: 0.383 seconds
#<class 'statsmodels.iolib.summary.Summary'>
#"""
# SARIMAX Results
#==============================================================================
#Dep. Variable: y No. Observations: 60
#Model: SARIMAX Log Likelihood -207.764
#Date: Wed, 19 Feb 2020 AIC 419.527
#Time: 20:20:11 BIC 423.716
#Sample: 0 HQIC 421.166
# - 60
#Covariance Type: opg
#==============================================================================
# coef std err z P>|z| [0.025 0.975]
#------------------------------------------------------------------------------
#intercept 39.9333 0.997 40.068 0.000 37.980 41.887
#sigma2 59.5956 13.897 4.288 0.000 32.358 86.833
#===================================================================================
#Ljung-Box (Q): 51.46 Jarque-Bera (JB): 1.50
#Prob(Q): 0.11 Prob(JB): 0.47
#Heteroskedasticity (H): 0.80 Skew: -0.01
#Prob(H) (two-sided): 0.63 Kurtosis: 2.23
#===================================================================================
#
#Warnings:
#[1] Covariance matrix calculated using the outer product of gradients (complex-step).
#"""
The best model is that of a mean-shifted white-noise model (the shift in mean is given by the intercept
parameter).
import pandas as pd
data = pd.DataFrame({
"Date": ["1959-01-01", "1959-01-02", "1959-01-03",
"1959-01-04", "1959-01-05", "1959-01-06", "1959-01-07", "1959-01-08",
"1959-01-09", "1959-01-10", "1959-01-11", "1959-01-12", "1959-01-13",
"1959-01-14", "1959-01-15", "1959-01-16", "1959-01-17", "1959-01-18",
"1959-01-19", "1959-01-20", "1959-01-21", "1959-01-22", "1959-01-23",
"1959-01-24", "1959-01-25", "1959-01-26", "1959-01-27", "1959-01-28",
"1959-01-29", "1959-01-30", "1959-01-31", "1959-02-01", "1959-02-02",
"1959-02-03", "1959-02-04", "1959-02-05", "1959-02-06", "1959-02-07",
"1959-02-08", "1959-02-09", "1959-02-10", "1959-02-11", "1959-02-12",
"1959-02-13", "1959-02-14", "1959-02-15", "1959-02-16", "1959-02-17",
"1959-02-18", "1959-02-19", "1959-02-20", "1959-02-21", "1959-02-22",
"1959-02-23", "1959-02-24", "1959-02-25", "1959-02-26", "1959-02-27",
"1959-02-28", "1959-03-01"],
"Births": [35, 32, 30, 31, 44, 29, 45, 43, 38, 27, 38, 33, 55, 47, 45,
37, 50, 43, 41, 52, 34, 53, 39, 32, 37, 43, 39, 35, 44, 38, 24,
23, 31, 44, 38, 50, 38, 51, 31, 31, 51, 36, 45, 51, 34, 52, 47,
45, 46, 39, 48, 37, 35, 52, 42, 45, 39, 37, 30, 35]
})
data["Date"] = pd.to_datetime(data["Date"])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.