简体   繁体   中英

Group by with the forecast package in R

I am working on several analyses where I would like to forecast some numeric value for each level of a factor or even multiple factors, eg condition on sex and age. My process so far has been fairly manual, something like below, which is fine for one variable/factor with say 2-5 levels. But it is not scalable to condition on factors with many levels or on multiple factors.

Is there any kind of "group by" or "subset" functionality within the forecast package that would help? I started writing a program to do the below process in the most general case (ie for any number of factors and levels) but have not been too successful yet.

BTW, unfortunately my data is private and I cannot share it here. But it shouldn't really matter, because the code below works and I'm looking for a better, ie scalable, solution.

# Example code

# category is a factor with levels A and B; amt is the variable to model/forecast
# using data.table syntax to create a vector for each category
vec1 <- dt[category == 'A']$amount
vec2 <- dt[category == 'B']$amount

# Create ts objects from above vectors
ts1 <- ts(vec1, start=c(start_year, start_month), end=c(end_year, end_month), frequency=12)
ts2 <- ts(vec2, start=c(start_year, start_month), end=c(end_year, end_month), frequency=12)

# Fit model 
fit1 <- auto.arima(ts1, trace = TRUE, stepwise = FALSE)
fit2 <- auto.arima(ts2, trace = TRUE, stepwise = FALSE)


# Forecast out using selected models
h <- 12
fcast1 <- forecast(fit1, h)
fcast2 <- forecast(fit2, h)

# funggcast pulls out data from the forecast object into a df (needed for ggplot2)
# output columns are date, observed, fitted, forecast, lo80, hi80, lo95, hi95
fcastdf1 <- funggcast(ts1, fcast1)
fcastdf2 <- funggcast(ts2, fcast2)

# Add in category
fcastdf1$category <- 'A'
fcastdf2$category <- 'B'


# Merge into one df
df <- merge(fcastdf1, fcastdf2, all=T)

# Basic qplot from ggplot2 package, I am actually incorporating quite a bit more formatting but this is just to give an idea
qplot(x=date, 
      y=observed, 
      data=df, 
      color=category, 
      group=category, geom="line") +
geom_line(aes(y=forecast), col='blue')

You can do this with tapply :

  res <- tapply(amount, category, function(x) {
    ts <- ts(x, start = start, frequency = 12)
    fit <- auto.arima(ts, trace = TRUE, stepwise = FALSE)
    fcastdf <- forecast(fit, h = h)
    return(fcastdf)
  })

This will return a named list of forecasts.

You will have to set the start to be the earliest date in your data set.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM