简体   繁体   English

拟合回归 Model 到时间序列数据

[英]Fitting Regression Model to Time-Series Data

I am trying to fit a regression model to a time series data in Python (basically to predict the trend).我正在尝试将回归 model 拟合到 Python 中的时间序列数据(基本上是为了预测趋势)。 I have applied seasonal decomposition using statsmodels earlier which extracts data to its three components including the data trend.我之前使用statsmodels应用了季节性分解,它将数据提取到它的三个组成部分,包括数据趋势。 However, I would like to know how I can come up with the best fit to my data using statistical-based regressions (by defining any functions) and check the sum of squares to compare various models and select the best one which fits my data.但是,我想知道如何使用基于统计的回归(通过定义任何函数)得出最适合我的数据并检查平方和以比较各种模型和 select 最适合我的数据的模型。 I should mention that I am not looking for learning-based regressions which rely on training/testing data.我应该提一下,我不是在寻找依赖于训练/测试数据的基于学习的回归。 I would appreciate if anyone can help me with this or even introduces a tutorial for this issue.如果有人可以帮助我解决这个问题,甚至为这个问题介绍一个教程,我将不胜感激。

Since you mentioned:既然你提到:

I would like to know how I can come up with the best fit to my data using statistical-based regressions (by defining any functions) and check the sum of squares to compare various models and select the best one which fits my data.我想知道如何使用基于统计的回归(通过定义任何函数)得出最适合我的数据并检查平方和以比较各种模型和 select 最适合我的数据的模型。 I should mention that I am not looking for learning-based regressions which rely on training/testing data.我应该提一下,我不是在寻找依赖于训练/测试数据的基于学习的回归。

Maybe ARIMA (Auto Regressive Integrated Moving Average ) model with given setup (P,D,Q), which can learn on history and predict() / forecast() .也许ARIMA (自回归综合移动平均线)model 具有给定的设置(P,D,Q),它可以学习历史和predict() / forecast() Please notice that split data into train and test are for sake of evaluation with approach of walk-forward validation:请注意,将数据拆分为训练和测试是为了使用前向验证方法进行评估:

from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot
from statsmodels.tsa.arima_model import ARIMA
from sklearn.metrics import mean_squared_error
from math import sqrt
# load dataset
def parser(x):
    return datetime.strptime('190'+x, '%Y-%m')
series = read_csv('/content/shampoo.txt', header=0, index_col=0, parse_dates=True, squeeze=True, date_parser=parser)
series.index = series.index.to_period('M')
# split into train and test sets
X = series.values
size = int(len(X) * 0.66)
train, test = X[0:size], X[size:len(X)]
history = [x for x in train]
predictions = list()
# walk-forward validation
for t in range(len(test)):
    model = ARIMA(history, order=(5,1,0))
    model_fit = model.fit()
    output = model_fit.forecast()
    yhat = output[0]
    predictions.append(yhat)
    obs = test[t]
    history.append(obs)
    print('predicted=%f, expected=%f' % (yhat, obs))
# evaluate forecasts
rmse = sqrt(mean_squared_error(test, predictions))
rmse_ = 'Test RMSE: %.3f' % rmse

# plot forecasts against actual outcomes
pyplot.plot(test, label='test')
pyplot.plot(predictions, color='red', label='predict')
pyplot.xlabel('Months')
pyplot.ylabel('Sale')
pyplot.title(f'ARIMA model performance with {rmse_}')
pyplot.legend()
pyplot.show()

I used the same library package you mentioned with following outputs including Root Mean Square Error (RMSE) evaluation:我使用了您提到的同一个库 package 以及以下输出,包括均方根误差 (RMSE)评估:

import statsmodels as sm
sm.__version__ # '0.10.2'

图片

Please see other post1 & post2 for further info.请参阅其他post1post2了解更多信息。 Maybe you can add trend line too也许你也可以添加趋势线

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM