[英]Multivariate time series forecasting with 3 months dataset
I have 3 months of data (each row corresponding to each day) generated and I want to perform a multivariate time series analysis for the same:我生成了 3 个月的数据(每一行对应于每一天),我想对其进行多变量时间序列分析:
the columns that are available are -可用的列是 -
Date Capacity_booked Total_Bookings Total_Searches %Variation
Each Date has 1 entry in the dataset and has 3 months of data and I want to fit a multivariate time series model to forecast other variables as well.每个日期在数据集中有 1 个条目,有 3 个月的数据,我想拟合一个多元时间序列 model 来预测其他变量。
So far, this was my attempt and I tried to achieve the same by reading articles.到目前为止,这是我的尝试,我试图通过阅读文章来达到同样的目的。
I did the same -我也这样做了-
df['Date'] = pd.to_datetime(Date , format = '%d/%m/%Y')
data = df.drop(['Date'], axis=1)
data.index = df.Date
from statsmodels.tsa.vector_ar.vecm import coint_johansen
johan_test_temp = data
coint_johansen(johan_test_temp,-1,1).eig
#creating the train and validation set
train = data[:int(0.8*(len(data)))]
valid = data[int(0.8*(len(data))):]
freq=train.index.inferred_freq
from statsmodels.tsa.vector_ar.var_model import VAR
model = VAR(endog=train,freq=train.index.inferred_freq)
model_fit = model.fit()
# make prediction on validation
prediction = model_fit.forecast(model_fit.data, steps=len(valid))
cols = data.columns
pred = pd.DataFrame(index=range(0,len(prediction)),columns=[cols])
for j in range(0,4):
for i in range(0, len(prediction)):
pred.iloc[i][j] = prediction[i][j]
I have a validation set and prediction set.我有一个验证集和预测集。 However the predictions are way worse than expected.
然而,预测比预期的要糟糕得多。
The plots of the dataset are - 1. % Variation数据集的图是 - 1. % Variation
The output that I am receiving are -我收到的 output 是 -
Prediction dataframe -预测 dataframe -
Validation Dataframe -验证 Dataframe -
As you can see that predictions are way off what is expected.如您所见,预测与预期相差甚远。 Can anyone advise a way to improve the accuracy.
任何人都可以提出一种提高准确性的方法。 Also, if I fit the model on whole data and then print the forecasts, it doesn't take into account that new month has started and hence to predict as such.
此外,如果我在整个数据上拟合 model 然后打印预测,则没有考虑到新月份已经开始并因此进行预测。 How can that be incorporated in here.
怎么可能被纳入这里。 any help is appreciated.
任何帮助表示赞赏。
EDIT编辑
Link to the dataset - Dataset链接到数据集 - 数据集
Thanks谢谢
One manner to improve your accuracy is to look to the autocorrelation of each variable, as suggested in the VAR documentation page:提高准确性的一种方法是查看每个变量的自相关性,如 VAR 文档页面中所建议的那样:
https://www.statsmodels.org/dev/vector_ar.html https://www.statsmodels.org/dev/vector_ar.html
The bigger the autocorrelation value is for a specific lag, the more useful this lag will be to the process.特定滞后的自相关值越大,该滞后对过程的用处就越大。
Another good idea is to look to the AIC criterion and the BIC criterion to verify your accuracy (the same link above has an example of usage).另一个好主意是查看 AIC 标准和 BIC 标准来验证您的准确性(上面的相同链接有一个使用示例)。 Smaller values indicate that there is a bigger probability that you have found the true estimator.
较小的值表明您找到真实估计量的可能性更大。
This way, you can vary the order of your autoregressive model and see the one that provides the lowest AIC and BIC, both analyzed together.这样,您可以改变自回归 model 的顺序,并查看提供最低 AIC 和 BIC 的顺序,两者一起分析。 If AIC indicates the best model is with lag of 3 and the BIC indicates the best model has a lag of 5, you should analyze the values of 3,4 and 5 to see the one with best results.
如果 AIC 指示最佳 model 的滞后为 3,而 BIC 指示最佳 model 的滞后为 5,则应分析 3,4 和 5 的值以查看具有最佳结果的值。
The best scenario would be to have more data (as 3 months is not much), but you can try these approaches to see if it helps.最好的情况是拥有更多数据(因为 3 个月并不多),但您可以尝试这些方法,看看是否有帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.