简体   繁体   English

具有 3 个月数据集的多元时间序列预测

[英]Multivariate time series forecasting with 3 months dataset

I have 3 months of data (each row corresponding to each day) generated and I want to perform a multivariate time series analysis for the same:我生成了 3 个月的数据(每一行对应于每一天),我想对其进行多变量时间序列分析:

the columns that are available are -可用的列是 -

Date    Capacity_booked Total_Bookings  Total_Searches  %Variation

Each Date has 1 entry in the dataset and has 3 months of data and I want to fit a multivariate time series model to forecast other variables as well.每个日期在数据集中有 1 个条目,有 3 个月的数据,我想拟合一个多元时间序列 model 来预测其他变量。

So far, this was my attempt and I tried to achieve the same by reading articles.到目前为止,这是我的尝试,我试图通过阅读文章来达到同样的目的。

I did the same -我也这样做了-

df['Date'] = pd.to_datetime(Date , format = '%d/%m/%Y')

data = df.drop(['Date'], axis=1)

data.index = df.Date

from statsmodels.tsa.vector_ar.vecm import coint_johansen
johan_test_temp = data
coint_johansen(johan_test_temp,-1,1).eig



#creating the train and validation set
train = data[:int(0.8*(len(data)))]
valid = data[int(0.8*(len(data))):]

freq=train.index.inferred_freq

from statsmodels.tsa.vector_ar.var_model import VAR

model = VAR(endog=train,freq=train.index.inferred_freq)
model_fit = model.fit()


# make prediction on validation
prediction = model_fit.forecast(model_fit.data, steps=len(valid))

cols = data.columns

pred = pd.DataFrame(index=range(0,len(prediction)),columns=[cols])
    for j in range(0,4):
        for i in range(0, len(prediction)):
           pred.iloc[i][j] = prediction[i][j]

I have a validation set and prediction set.我有一个验证集和预测集。 However the predictions are way worse than expected.然而,预测比预期的要糟糕得多。

The plots of the dataset are - 1. % Variation数据集的图是 - 1. % Variation 在此处输入图像描述

  1. Capacity_Booked容量_已预订在此处输入图像描述

  2. Total bookings and searches总预订量和搜索量在此处输入图像描述

The output that I am receiving are -我收到的 output 是 -

Prediction dataframe -预测 dataframe -

在此处输入图像描述

Validation Dataframe -验证 Dataframe -

在此处输入图像描述

As you can see that predictions are way off what is expected.如您所见,预测与预期相差甚远。 Can anyone advise a way to improve the accuracy.任何人都可以提出一种提高准确性的方法。 Also, if I fit the model on whole data and then print the forecasts, it doesn't take into account that new month has started and hence to predict as such.此外,如果我在整个数据上拟合 model 然后打印预测,则没有考虑到新月份已经开始并因此进行预测。 How can that be incorporated in here.怎么可能被纳入这里。 any help is appreciated.任何帮助表示赞赏。

EDIT编辑

Link to the dataset - Dataset链接到数据集 - 数据集

Thanks谢谢

One manner to improve your accuracy is to look to the autocorrelation of each variable, as suggested in the VAR documentation page:提高准确性的一种方法是查看每个变量的自相关性,如 VAR 文档页面中所建议的那样:

https://www.statsmodels.org/dev/vector_ar.html https://www.statsmodels.org/dev/vector_ar.html

The bigger the autocorrelation value is for a specific lag, the more useful this lag will be to the process.特定滞后的自相关值越大,该滞后对过程的用处就越大。

Another good idea is to look to the AIC criterion and the BIC criterion to verify your accuracy (the same link above has an example of usage).另一个好主意是查看 AIC 标准和 BIC 标准来验证您的准确性(上面的相同链接有一个使用示例)。 Smaller values indicate that there is a bigger probability that you have found the true estimator.较小的值表明您找到真实估计量的可能性更大。

This way, you can vary the order of your autoregressive model and see the one that provides the lowest AIC and BIC, both analyzed together.这样,您可以改变自回归 model 的顺序,并查看提供最低 AIC 和 BIC 的顺序,两者一起分析。 If AIC indicates the best model is with lag of 3 and the BIC indicates the best model has a lag of 5, you should analyze the values of 3,4 and 5 to see the one with best results.如果 AIC 指示最佳 model 的滞后为 3,而 BIC 指示最佳 model 的滞后为 5,则应分析 3,4 和 5 的值以查看具有最佳结果的值。

The best scenario would be to have more data (as 3 months is not much), but you can try these approaches to see if it helps.最好的情况是拥有更多数据(因为 3 个月并不多),但您可以尝试这些方法,看看是否有帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用多元时间序列预测的需求预测 - Demand Forecasting using multivariate time Series forecasting 使用 LSTM 将单变量转换为多变量时间序列预测 - Transform Univariate to Multivariate Time Series Forecasting with LSTM Keras多元时间序列预测模型返回NaN作为MAE和损失 - Keras multivariate time series forecasting model returns NaN as MAE and loss 在 Keras 中使用 LSTM 进行多元时间序列预测(关于未来数据) - Multivariate time series forecasting with LSTMs in Keras (on future data) 我可以对年度数据执行多元时间序列预测吗 - Can I perform multivariate time series forecasting on yearly data 如何在时间序列预测中使用mysql数据集 - how to used mysql dataset in time series forecasting 我们可以在 Python 中对多元时间序列数据集进行聚类吗 - Can we cluster Multivariate Time Series dataset in Python 如何重塑为 Keras 用于 XGBoost 的多步和多变量时间序列预测创建的 3D 张量? - How to reshape a 3D tensor created for multistep and multivariate time series forecasting for Keras to be used in XGBoost? 用python中的时间序列预测 - Forecasting with time series in python 时间序列分析/预测 - Time Series Analysis / Forecasting
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM