简体   繁体   中英

Parameters of ARIMA and SARIMAX

I'm doing a project on data analysis with timeseries and forecasting. I have a dataframe which contains a lot of data from which I need to handle Covid cases . The dataframe looks like that:

            Covid cases  Confirmed Infections Difference
date                                                    
2020-02-24           19                              NaN
2020-02-25            0                            -19.0
2020-02-26            0                              0.0
2020-02-27            1                              1.0
2020-02-28            2                              1.0
...                 ...                              ...
2021-02-25         1502                           -136.0
2021-02-26         1468                            -34.0
2021-02-27         1474                              6.0
2021-02-28          715                           -759.0
2021-03-01          298                           -417.0

In order to make a prediction I use the ARIMA model (dataframe is stationary) and after that I'm trying to apply a forecast line to my graph. I'm using some parameters for ARIMA and SARIMAX and then I'm printing the graph with pandas. The line is fitting the timeseries but it doesn't appear where the line ends.

Code:

def timeseries(dataframe, city_name):
    cols = ['ID', 'name']  # Creating columns to be dropped
    dataframe.drop(cols, axis=1, inplace=True)  # Dropping columns that I don't need
    dataframe.columns = ["date", "Covid cases"]
    dataframe.describe()
    dataframe.set_index('date', inplace=True)
    dataframe.plot(figsize=(15, 6))  # Setting figure size
    dataframe['Confirmed Infections Difference'] = dataframe['Covid cases'] - dataframe['Covid cases'].shift(1)
    adfuller_test(dataframe['Confirmed Infections Difference'].dropna())
    model = ARIMA(dataframe['Covid cases'], order=(1, 1, 1))
    model_fit = model.fit(disp=0)
    print(model_fit.summary())
    dataframe['forecast'] = model_fit.predict(start=90, end=103, dynamic=True)
    model = sm.tsa.statespace.SARIMAX(dataframe['Covid cases'], order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
    results = model.fit()
    dataframe['forecast'] = results.predict(start=90, end=103, dynamic=True)
    future_dates = [dataframe.index[-1] + DateOffset(months=x) for x in range(0, 24)]
    future_datest_df = pd.DataFrame(index=future_dates[1:], columns=dataframe.columns)

    future_datest_df.tail()

    future_df = pd.concat([dataframe, future_datest_df])

    future_df['forecast'] = results.predict(start=104, end=120, dynamic=True)
    future_df[['Covid cases', 'forecast']].plot(figsize=(12, 8))

Here is the result graph:

在此处输入图像描述

So as you can understand the forecast seems to not be applied correctly. I suppose it's a problem with some of the parameters I'm giving to ARIMA and SARIMAX.

An example of expected graph:

在此处输入图像描述

Reminder: date column is about every single day. The forecast I want to be is for the next few days.

Any thoughts?

In several steps of your implementation, you are equalizing the column dataframe['forecast'] to the results of new calculations (besides predicting values two times for different models and concatenating dataframes with similarly named columns):

print(model_fit.summary())
dataframe['forecast'] = model_fit.predict(start=90, end=103, dynamic=True)

# ...

dataframe['forecast'] = results.predict(start=90, end=103, dynamic=True)

# ...

future_df = pd.concat([dataframe, future_datest_df])

future_df['forecast'] = results.predict(start=104, end=120, dynamic=True)

Please make sure that:

  • You are not fully replacing the column values with the equalizations, instead of appending new dataframe entries;
  • You are getting the right columns to plot at the end, because of the columns with similar name.

I cannot ensure because I don't have the full results of your code, but the error in the plot may come from some of these aspects...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM