Time Series Forecasting the Beginning

Question

I have this series hospitalization_diff , with .head()

date
2020-10-16    347.0
2020-10-15    149.0
2020-10-14    530.0
2020-10-13   -489.0
2020-10-12   -859.0
Name: hospitalizedIncrease, dtype: float64

I want to forecast the time series using an ARIMA model (already tested for stationarity, differentiated, and optimized parameters). I got the code-bit from here .

# split into train-test set
size = int(len(X) * 0.75)
train, test = hospitalization_diff[:size], hospitalization_diff[size:]

# Build Model
model = ARIMA(train, order=(0, 0, 1))  
fitted = model.fit(disp=-1)  

# Forecast
fc, se, conf = fitted.forecast(len(test), alpha=0.05)  # 95% conf

# Make as pandas series
fc_series = pd.Series(fc, index=test.index)
lower_series = pd.Series(conf[:, 0], index=test.index)
upper_series = pd.Series(conf[:, 1], index=test.index)

# Plot
plt.figure(figsize=(12,5), dpi=100)
plt.plot(train, label='training')
plt.plot(test, label='actual')
plt.plot(fc_series, label='forecast')
plt.fill_between(lower_series.index, lower_series, upper_series, 
                 color='k', alpha=.15)
plt.title('Forecast vs Actuals')
plt.legend(loc='upper left', fontsize=8)
plt.show()

As an output, however, I get :

I don't understand why it's predicting the start of the series, what am I doing wrong?

Answer 1

Because your datasets (both train and test ) were in reversed chronological order, which must be corrected at the very beginning.

# apply at the beginning of your code
hospitalization_diff.sort_index(inplace=True)

Time Series Forecasting the Beginning

Question

1 answers

solution1
0 2020-10-18 20:52:23

Time Series Forecasting the Beginning

Question

1 answers

solution1 0 2020-10-18 20:52:23

solution1
0 2020-10-18 20:52:23