简体   繁体   English

statsmodel ARIMA 的不切实际的均方误差

[英]Unrealistic Mean Squared Error with statsmodel ARIMA

Foreword: I have no idea what I'm doing.前言:我不知道我在做什么。

For a uni stats class we have to do some timeseries forecasting in python.对于 uni stats class,我们必须在 python 中进行一些时间序列预测。

I've basically followed this tutorial but used my data: https://www.digitalocean.com/community/tutorials/a-guide-to-time-series-forecasting-with-arima-in-python-3我基本上遵循了本教程,但使用了我的数据: https://www.digitalocean.com/community/tutorials/a-guide-to-time-series-forecasting-with-arima-in-python-3

Everything is working perfectly fine, except the MSE.除了 MSE,一切都运行良好。

When plotted everything, it looks like this:当绘制所有内容时,它看起来像这样:

在此处输入图像描述

Here's my data which I use for the MSE:这是我用于 MSE 的数据:

Original data (transactions['2016-05-01':]):原始数据(交易['2016-05-01':]):

DATE_BOOKING
2016-05-01    11327.548387
2016-06-01    11534.000000
2016-07-01    11391.677419
2016-08-01    11259.451613
2016-09-01    11968.366667
2016-10-01     7844.387097
2016-11-01     6270.800000
2016-12-01     5103.516129
2017-01-01     4631.032258
2017-02-01     5092.928571
2017-03-01     7800.258065
2017-04-01     8359.133333
2017-05-01     9495.062500

Forecasted (predicted) data (pred.predicted_mean):预测(预测)数据(pred.predicted_mean):

DATE_BOOKING
2016-05-01     9375.120610
2016-06-01    11038.420268
2016-07-01    11571.006853
2016-08-01    10856.183244
2016-09-01    10148.262512
2016-10-01     9433.060067
2016-11-01     7044.780142
2016-12-01     5037.930509
2017-01-01     5337.963486
2017-02-01     5767.081120
2017-03-01     6616.610224
2017-04-01     9389.836132
2017-05-01    10258.791544

I'm calculating the MSE the following way:我正在通过以下方式计算 MSE:

transactions_forecasted = pred.predicted_mean
transactions_truth = transactions['2016-05-01':]
mse = ((transactions_forecasted - transactions_truth) ** 2).mean()
print('The Mean Squared Error of our forecasts is {}'.format(round(mse, 2)))
print('The Root Mean Squared Error of our forecasts is {}'.format(round(np.sqrt(mse), 2)))

This is the result:这是结果:
The Mean Squared Error of our forecasts is 1130250.12我们预测的均方误差为 1130250.12
The Root Mean Squared Error of our forecasts is 1063.13我们预测的均方根误差为 1063.13

Compared to other MSEs I've googled it seems awfully high.与我搜索过的其他 MSE 相比,它似乎非常高。

Can you tell me what I'm doing wrong?你能告诉我我做错了什么吗?

I can post more (all) code if needed.如果需要,我可以发布更多(全部)代码。

Thanks in advance!提前致谢!

Mean squared error can't be compared across datasets, because its magnitude depends on the units of the dataset.均方误差无法跨数据集进行比较,因为其大小取决于数据集的单位。 So you can't compare the MSE you're getting here to the MSE you see in example problems using other data.因此,您无法将您在此处获得的 MSE 与您在使用其他数据的示例问题中看到的 MSE 进行比较。

One way to tell that the MSE value you're getting is reasonable is to look at the root mean squared error, which is in the scale of your original dataset.判断您获得的 MSE 值是否合理的一种方法是查看均方根误差,它在原始数据集的范围内。 It's about 1000, and on average it looks like the forecasts are roughly 1000 away from the true values.它大约是 1000,平均而言,预测看起来与真实值相差大约 1000。

(this second part is a bit of a simplification, since RMSE penalizes large errors more than small errors, but it gives you an approximate check that the value you're getting is in the ballpark). (这第二部分有点简化,因为 RMSE 对大错误的惩罚比对小错误的惩罚更大,但它可以让您大致检查您获得的值是否在大致范围内)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM