简体   繁体   English

R - 预测多个时间序列(15K 产品)

[英]R - Forecast multiple time-series (15K Products)

Hi Stack Overflow community.嗨堆栈溢出社区。

I have 5 years of weekly price data for more than 15K Products (5*15K**52 records).我有超过 15K 种产品的 5 年每周价格数据(5*15K**52 条记录)。 Each product is a univariate time series.每个产品都是一个单变量时间序列。 The objective is to forecast the price of each product.目标是预测每种产品的价格。

I am familiar with the univariate time series analysis in which we can visualize each ts series, plot its ACF, PACF, and forecast the series.我熟悉单变量时间序列分析,我们可以在其中可视化每个 ts 序列,绘制其 ACF、PACF 并预测该序列。 But, Univariate time series analysis is not possible in this case when I have 15K different time-series, can not visualize each time series, its ACF, PACF, and forecast separately of each product, and make a tweak/decision on it.但是,单变量时间序列分析在这种情况下是不可能的,因为我有 15K 个不同的时间序列,无法可视化每个时间序列、其 ACF、PACF 和每个产品的单独预测,并对其进行调整/决定。

I am looking for some recommendations and directions to solve this multi-series forecasting problem using R (preferable).我正在寻找一些建议和方向来使用 R(最好)解决这个多系列预测问题。 Any help and support will be appreciated.任何帮助和支持将不胜感激。

Thanks in advance.提前致谢。

I would suggest you use auto.arima from the forecast package.我建议您使用forecast包中的auto.arima

This way you don't have to search for the right ARIMA model.这样您就不必搜索正确的 ARIMA 模型。

auto.arima: Returns best ARIMA model according to either AIC, AICc or BIC value. auto.arima:根据 AIC、AICc 或 BIC 值返回最佳 ARIMA 模型。 The function conducts a search over possible models within the order constraints provided.该函数在提供的顺序约束内对可能的模型进行搜索。

fit <- auto.arima(WWWusage)
plot(forecast(fit,h=20))

Instead of WWWusage you could put one of your time series, to fit an ARIMA model.您可以放置​​一个时间序列,而不是 WWWusage,以适合 ARIMA 模型。 With forecast you then perform the forecast - in this case 20 time steps ahead ( h=20 ).使用forecast您然后执行预测 - 在这种情况下提前 20 个时间步 ( h=20 )。

auto.arima basically chooses the ARIMA parameters for you (according to AIC - Akaike information criterion). auto.arima 基本上为您选择 ARIMA 参数(根据 AIC - Akaike 信息准则)。

You would have to try, if it is too computational expensive for you.如果对您来说计算成本太高,您将不得不尝试。 But in general it is not that uncommon to forecast that many time series.但总的来说,预测这么多时间序列并不少见。

Another thing to keep in mind could be, that it might after all not be that unlikely, that there is some cross-correlation in the time series.要记住的另一件事可能是,毕竟这可能不是那么不可能,时间序列中存在一些互相关。 So from a forecasting precision standpoint it could make sense to not treat this as a univariate forecasting problem.因此,从预测精度的角度来看,不将其视为单变量预测问题是有意义的。

The setting it sounds quite similar to the m5 forecasting competition that was recently held on Kaggle.它的设置听起来与最近在 Kaggle 上举行的m5 预测竞赛非常相似。 Goal was to point forecasts the unit sales of various products sold in the USA by Walmart.目标是预测沃尔玛在美国销售的各种产品的单位销售额。

So a lot of time series of sales data to forecast.所以很多时间序列的销售数据来预测。 In this case the winner did not do a univariate forecast.在这种情况下,获胜者没有进行单变量预测。 Here a link to a description of the winning solution .这里有一个链接到获胜解决方案的描述。 Since the setting seems so similar to yours, it probably makes sense to read a little bit in the kaggle forum of this challenge - there might be even useful notebooks (code examples) available.由于设置看起来与您的非常相似,因此在此挑战的 kaggle 论坛中阅读一些内容可能是有意义的 - 甚至可能有有用的笔记本(代码示例)可用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM