简体   繁体   English

R auto.arima 预测

[英]R auto.arima forecast

I want create forecast for something, And I choose auto.arima.我想为某事创建预测,我选择 auto.arima。 After trained, I can't calculate forecast 2 more articles:经过训练,我无法计算预测 2 篇文章:

my_forecast <- ts(frc$sales_30, frequency = 12)

my_forecast  <- tsclean(my_forecast)

fit <- auto.arima(my_forecast)

But I have 100 articles +nd i need forecast for all this names (format: Year, Month, Sales, Article)但我有 100 篇文章 + 我需要预测所有这些名称(格式:年、月、销售额、文章)

The typical workflow in R for this task is listwise. R 中针对此任务的典型工作流是列表式的。 Meaning you spread your data by articels in list-items and apply funcions on these.这意味着您通过list-items中的文章传播数据并在这些项目上应用函数。 As you might have understood already the year and month are irrelevant as the time-series is generated by the frequency variable of the ts() function.正如您可能已经了解的那样,年份和月份是无关紧要的,因为time-series是由ts() function 的频率变量生成的。

Therefore this sample will work with articles A and B only aswell as theire imaginary monthly sales vector, which we assume has been sorted by date already.因此,此示例仅适用于文章 A 和 B 以及它们虚构的月销售向量,我们假设它已经按日期排序。

I will not dive into technicallities of time-series analysis/predictions and do mainly focus on the process/code to make multiple predictions based on a df that contains all articles (or any on level grouping) and the according sales history.我不会深入研究time-series分析/预测的技术细节,而是主要关注基于包含所有文章(或任何级别分组)和相应销售历史的 df 进行多个预测的过程/代码。 I did not use the tsclean() function but it should be evident from the workflow how to include it:我没有使用tsclean() function 但从工作流程中应该可以看出如何包含它:

library(forecast)
library(tidyverse)
# set up some dummy data (has no clear pattern in terms of seasonality etc. but works for demo)
## bear in  mind that this is randomly generated data therefore you most likely will not reproduce my data but with the help of a seed you can work arround this as well.
df <- data.frame(article = c(rep("A", 24), rep("B", 24)), 
                 sales = c(sample(seq(from = 20, to = 50, by = 5), size = 24, replace = TRUE),
                           sample(seq(from = 20, to = 50, by = 5), size = 24, replace = TRUE)))
# build grouping inside de df/tibble
dfg <- df %>% 
    dplyr::group_by(article) 
# split the new df by grouping criteria into list
dfl <- dfg %>%
    dplyr::group_split(.keep = FALSE)
# set list names acording to article value (no needed but might be helpfull for you)
names(dfl) <- dplyr::group_keys(dfg)$article
# apply ts function with frequency 12 to the list items
dflt <- lapply(dfl, ts, frequency = 12)
# apply the auto.arima to build list of models
dfltm <- lapply(dflt, forecast::auto.arima)
# apply forecast with horizon 2 on the list of final models from auto.arima
predictions <- lapply(dfltm, forecast::forecast, h = 2)
# print results
predictions 

$A
      Point Forecast    Lo 80    Hi 80    Lo 95   Hi 95
Jan 3       34.79167 22.47636 47.10697 15.95703 53.6263
Feb 3       34.79167 22.47636 47.10697 15.95703 53.6263

$B
      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
Jan 3       34.58333 20.32802 48.83865 12.78171 56.38496
Feb 3       34.58333 20.32802 48.83865 12.78171 56.38496

A modern way of doing the same thing is working with nested lists inside of a tibble :做同样事情的现代方法是在tibble中使用嵌套列表:

       # build list inside the tibble/df by existing groupings
npd <- tidyr::nest(dfg) %>%
                           # generate new column of ts series data
    dplyr::mutate(tsdata = purrr::map(data, ~ ts(.x, frequency = 12)),
                           # use auto.arima on the data to build new column of final auto.arima models
                  models = purrr::map(tsdata, ~ forecast::auto.arima(.x)),
                                # generate forecast as new column
                  predictions = purrr::map(models, ~ forecast::forecast(.x, h = 2))) 
# print prediction results
npd$predictions
[[1]]
      Point Forecast    Lo 80    Hi 80    Lo 95   Hi 95
Jan 3       34.79167 22.47636 47.10697 15.95703 53.6263
Feb 3       34.79167 22.47636 47.10697 15.95703 53.6263

[[2]]
      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
Jan 3       34.58333 20.32802 48.83865 12.78171 56.38496
Feb 3       34.58333 20.32802 48.83865 12.78171 56.38496

As mentioned initially the ts() function works based on frequency not a date column, meaning you have to secure that months with no sales are listed and that all articles have a complete data time line, increasingly ordered (time oriented).正如最初提到的, ts() function 基于频率而不是日期列工作,这意味着您必须确保列出没有销售的月份,并且所有文章都有完整的数据时间线,越来越有序(面向时间)。 Missing values have to be included before forming the time-series object.在形成time-series object 之前,必须包含缺失值。

Finally I highly recommend the open book from the author of the forecast package, which can be found here: https://otexts.com/fpp2/最后强烈推荐forecast作者package的开卷书,可以在这里找到: https://otexts.com/fpp2/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM