my dataset has the following 3 columns:
date client_id sales
01/01/2012 client 1 $1000
02/01/2012 client 1 $900
...
...
12/01/2014 client 1 $1000
01/01/2012 client 2 $300
02/01/2012 client 2 $450
...
..
12/01/2014 client 2 $375
and so on for 98 other clients (24 monthly datapoints for each client)
I have multiple clients (around 100)...data is in time series format for each client (24 monthly datapoints)
how do I automatically forecast sales for all the 100 clients using auto.arima in R? is there a by statement option? or do i have to use loops?
Thanks
You can always use lapply()
:
lapply(tsMat, function(x) forecast(auto.arima(x)))
A little example follows:
library(forecast)
#generate some time-series:
sales <- replicate(100,
arima.sim(n = 24, list(ar = c(0.8), ma = c(-0.2)), sd = sqrt(0.1))
)
dates <- seq(as.Date("2012/1/1"), by = "month", length.out=24)
df <- data.frame(date=rep(dates,100), client_id=rep(1:100,each=24), sales=c(sales))
#reshape and convert it to a proper time-series format like ts:
tsMat <- ts(reshape2::dcast(df, date~client_id), start=2012, freq=12)
#forecast by auto.arima:
output <- lapply(tsMat, function(x) forecast(auto.arima(x)))
You can also specify the number to forecast in the future by using 'h=#ofPeriods' in the forecast call
Forecast.allStates <- as.data.frame(lapply(ts.allStates, function(x) forecast(auto.arima(x),h=67)))
Another alternative could be tsibble
and fable
:
library(tsibble)
library(fable)
library(dplyr)
df %>%
as_tsibble(key = client_id, index = date) %>%
mutate(date = yearmonth(date)) %>%
model(arima = ARIMA(sales)) %>%
forecast(h = "1 year")
#> # A fable: 1,200 x 5 [1M]
#> # Key: client_id, .model [100]
#> client_id .model date sales .mean
#> <int> <chr> <mth> <dist> <dbl>
#> 1 1 arima 2014 gen N(0.072, 0.089) 0.0718
#> 2 1 arima 2014 feb N(0.28, 0.11) 0.281
#> 3 1 arima 2014 mar N(0.35, 0.12) 0.351
#> 4 1 arima 2014 apr N(0.024, 0.12) 0.0242
#> 5 1 arima 2014 mag N(-0.16, 0.12) -0.162
#> 6 1 arima 2014 giu N(0.029, 0.12) 0.0292
#> 7 1 arima 2014 lug N(0.24, 0.12) 0.243
#> 8 1 arima 2014 ago N(0.11, 0.12) 0.110
#> 9 1 arima 2014 set N(0.37, 0.12) 0.374
#> 10 1 arima 2014 ott N(0.37, 0.12) 0.369
#> # ... with 1,190 more rows
where df
is:
set.seed(1)
sales <- replicate(100, arima.sim(n = 24, list(ar = c(0.8), ma = c(-0.2)), sd = sqrt(0.1)))
dates <- seq(as.Date("2012/1/1"), by = "month", length.out=24)
df <- data.frame(date=rep(dates,100), client_id=rep(1:100,each=24), sales=c(sales))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.