简体   繁体   中英

forecasting multiple time series in R using auto.arima

my dataset has the following 3 columns:

date client_id sales
01/01/2012 client 1 $1000
02/01/2012 client 1 $900
...
...
12/01/2014 client 1 $1000
01/01/2012 client 2 $300
02/01/2012 client 2 $450
...
..
12/01/2014 client 2 $375

and so on for 98 other clients (24 monthly datapoints for each client)

I have multiple clients (around 100)...data is in time series format for each client (24 monthly datapoints)

how do I automatically forecast sales for all the 100 clients using auto.arima in R? is there a by statement option? or do i have to use loops?

Thanks

You can always use lapply() :

lapply(tsMat, function(x) forecast(auto.arima(x)))

A little example follows:

library(forecast)
#generate some time-series:
sales <- replicate(100, 
    arima.sim(n = 24, list(ar = c(0.8), ma = c(-0.2)), sd = sqrt(0.1)) 
)
dates <- seq(as.Date("2012/1/1"), by = "month", length.out=24)
df <- data.frame(date=rep(dates,100), client_id=rep(1:100,each=24), sales=c(sales))
#reshape and convert it to a proper time-series format like ts:
tsMat <- ts(reshape2::dcast(df, date~client_id), start=2012, freq=12)
#forecast by auto.arima:
output <- lapply(tsMat, function(x) forecast(auto.arima(x)))

You can also specify the number to forecast in the future by using 'h=#ofPeriods' in the forecast call

Forecast.allStates <- as.data.frame(lapply(ts.allStates, function(x) forecast(auto.arima(x),h=67)))

Another alternative could be tsibble and fable :

library(tsibble)
library(fable)
library(dplyr)

df %>%
   as_tsibble(key = client_id, index = date) %>%
   mutate(date = yearmonth(date)) %>% 
   model(arima = ARIMA(sales)) %>% 
   forecast(h = "1 year")
#> # A fable: 1,200 x 5 [1M]
#> # Key:     client_id, .model [100]
#>    client_id .model     date           sales   .mean
#>        <int> <chr>     <mth>          <dist>   <dbl>
#>  1         1 arima  2014 gen N(0.072, 0.089)  0.0718
#>  2         1 arima  2014 feb   N(0.28, 0.11)  0.281 
#>  3         1 arima  2014 mar   N(0.35, 0.12)  0.351 
#>  4         1 arima  2014 apr  N(0.024, 0.12)  0.0242
#>  5         1 arima  2014 mag  N(-0.16, 0.12) -0.162 
#>  6         1 arima  2014 giu  N(0.029, 0.12)  0.0292
#>  7         1 arima  2014 lug   N(0.24, 0.12)  0.243 
#>  8         1 arima  2014 ago   N(0.11, 0.12)  0.110 
#>  9         1 arima  2014 set   N(0.37, 0.12)  0.374 
#> 10         1 arima  2014 ott   N(0.37, 0.12)  0.369 
#> # ... with 1,190 more rows

where df is:

set.seed(1)
sales <- replicate(100, arima.sim(n = 24, list(ar = c(0.8), ma = c(-0.2)), sd = sqrt(0.1)))
dates <- seq(as.Date("2012/1/1"), by = "month", length.out=24)
df <- data.frame(date=rep(dates,100), client_id=rep(1:100,each=24), sales=c(sales))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM