简体   繁体   中英

Unrealistic Mean Squared Error with statsmodel ARIMA

Foreword: I have no idea what I'm doing.

For a uni stats class we have to do some timeseries forecasting in python.

I've basically followed this tutorial but used my data: https://www.digitalocean.com/community/tutorials/a-guide-to-time-series-forecasting-with-arima-in-python-3

Everything is working perfectly fine, except the MSE.

When plotted everything, it looks like this:

在此处输入图像描述

Here's my data which I use for the MSE:

Original data (transactions['2016-05-01':]):

DATE_BOOKING
2016-05-01    11327.548387
2016-06-01    11534.000000
2016-07-01    11391.677419
2016-08-01    11259.451613
2016-09-01    11968.366667
2016-10-01     7844.387097
2016-11-01     6270.800000
2016-12-01     5103.516129
2017-01-01     4631.032258
2017-02-01     5092.928571
2017-03-01     7800.258065
2017-04-01     8359.133333
2017-05-01     9495.062500

Forecasted (predicted) data (pred.predicted_mean):

DATE_BOOKING
2016-05-01     9375.120610
2016-06-01    11038.420268
2016-07-01    11571.006853
2016-08-01    10856.183244
2016-09-01    10148.262512
2016-10-01     9433.060067
2016-11-01     7044.780142
2016-12-01     5037.930509
2017-01-01     5337.963486
2017-02-01     5767.081120
2017-03-01     6616.610224
2017-04-01     9389.836132
2017-05-01    10258.791544

I'm calculating the MSE the following way:

transactions_forecasted = pred.predicted_mean
transactions_truth = transactions['2016-05-01':]
mse = ((transactions_forecasted - transactions_truth) ** 2).mean()
print('The Mean Squared Error of our forecasts is {}'.format(round(mse, 2)))
print('The Root Mean Squared Error of our forecasts is {}'.format(round(np.sqrt(mse), 2)))

This is the result:
The Mean Squared Error of our forecasts is 1130250.12
The Root Mean Squared Error of our forecasts is 1063.13

Compared to other MSEs I've googled it seems awfully high.

Can you tell me what I'm doing wrong?

I can post more (all) code if needed.

Thanks in advance!

Mean squared error can't be compared across datasets, because its magnitude depends on the units of the dataset. So you can't compare the MSE you're getting here to the MSE you see in example problems using other data.

One way to tell that the MSE value you're getting is reasonable is to look at the root mean squared error, which is in the scale of your original dataset. It's about 1000, and on average it looks like the forecasts are roughly 1000 away from the true values.

(this second part is a bit of a simplification, since RMSE penalizes large errors more than small errors, but it gives you an approximate check that the value you're getting is in the ballpark).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM