简体繁体 English

如何找到ARIMA模型的准确性？

[英]How to find accuracy of ARIMA model?

原文 2017-08-11 02:52:37 4 2 python/ time-series/ arima

Problem description : Prediction on CPU utilization.问题描述：CPU 利用率预测。
Approach: Used time series algorithm.方法：使用时间序列算法。

Step 1 : From Elasticsearch I collected 1000 observations and exported on Python.第 1 步：我从 Elasticsearch 收集了 1000 个观察结果并在 Python 上导出。

Step 2 : Plotted the data and checked whether data is stationary or not.第 2 步：绘制数据并检查数据是否平稳。

Step 3 : Used log to convert the data into stationary form.步骤3 ：使用日志将数据转换为固定形式。

Step 4 : Done DF test, ACF and PACF.第 4 步：完成 DF 测试、ACF 和 PACF。

Step 5 : Build ARIMA(3,0,2) model.第 5 步：构建ARIMA(3,0,2)模型。

Step 6 : Forecast.第 6 步：预测。

I built an ARIMA (3,0,2) time-series model but was unable to find the accuracy of model.我构建了一个ARIMA (3,0,2)时间序列模型，但无法找到模型的准确性。 Is there any command through which we can check the accuracy of model in Python?有什么命令可以让我们在 Python 中检查模型的准确性吗？

Could you please advice if my approach was correct or not and how to find accuracy of model in Python?您能否建议我的方法是否正确以及如何在 Python 中找到模型的准确性？

2 个解决方案

Approach is correct or not-方法正确与否-

I hope you would have found out best P,Q values from ACF and PACF.我希望你能从 ACF 和 PACF 中找到最好的 P、Q 值。 There are github codes in python that will do sth like Auto Arima (automatically find best parameter), so you dont have to worry about P,q values. python中有github代码会做一些类似Auto Arima（自动找到最佳参数）的事情，所以你不必担心P，q值。 Basically one takes P,Q values where BIC of model is least.基本上，在模型的 BIC 最小的情况下，采用 P,Q 值。

Pyhton code- Pyhton 代码-

There are three primary metrics used to evaluate linear models.有三个主要指标用于评估线性模型。 These are: Mean absolute error (MAE), Mean squared error (MSE), or Root mean squared error (RMSE).它们是：平均绝对误差 (MAE)、均方误差 (MSE) 或均方根误差 (RMSE)。

MAE : The easiest to understand. MAE ：最容易理解。 Represents average error代表平均误差

MSE : Similar to MAE but noise is exaggerated and larger errors are “punished”. MSE : 类似于 MAE，但噪声被夸大了，更大的错误被“惩罚”。 It is harder to interpret than MAE as it's not in base units, however, it is generally more popular.它比 MAE 更难解释，因为它不在基本单位中，但是，它通常更受欢迎。

RMSE : Most popular metric, similar to MSE, however, the result is square rooted to make it more interpretable as it's in base units. RMSE ：最流行的指标，类似于 MSE，但是，结果是平方根，使其更易于解释，因为它是基本单位。 It is recommended that RMSE be used as the primary metric to interpret your model.建议将 RMSE 用作解释模型的主要指标。

Below, you can see how to calculate each metric.下面，您可以看到如何计算每个指标。 All of them require two lists as parameters, with one being your predicted values and the other being the true values-所有这些都需要两个列表作为参数，一个是您的预测值，另一个是真实值-

I have been doing some research on this, unfortunately,I could not find a score function with regard to statsmodels in python.我一直在对此进行一些研究，不幸的是，我在 python 中找不到关于statsmodels的score函数。 I would recommend to visit this site as recommended as an answer from an earlier post .我建议按照之前帖子中的建议访问该站点。

Also, as noted in the answer "statsmodels does have performance measures for continuous dependent variables."此外，如答案中所述，“statsmodels 确实具有连续因变量的性能度量。”

Hopefully some geek would find and answer and if I find anything with regard to this, I will definitely post it to the community.希望一些极客会找到并回答，如果我发现任何与此相关的内容，我一定会将其发布到社区。