简体   繁体   中英

Multiple linear regression by group in a rolling window in R

My dataframe looks like this:

Date = c(rep(as.Date(seq(15000,15012)),2))
Group = c(rep("a",13),rep("b",13))
y = c(seq(1,26,1))
x1 = c(seq(0.01,0.26,0.01))
x2 = c(seq(0.02,0.26*2,0.02))
df = data.frame(Group,Date,y,x1,x2)

head(df,3)
Group Date y x1 x2
a 2011-01-26 1 0.01 0.02
a 2011-01-27 2 0.02 0.04
a 2011-01-28 3 0.03 0.06

And I would like to do multiple regression by group (y as the dependent variable and x1, x2 as the independent variables) in a rolling window ie 3.

I have tried to achieve this using packages tidyverse and zoo with following codes but failed.

  ## define multi-var-linear regression function and get the residual
  rsd <- function(df){
    lm(formula = y~x1+x2, data = as.data.frame(df), na.action = na.omit) %>%
      resid() %>%
      return()
  }
  ## apply it by group with rolling window
  x <- df %>% group_by(Group) %>%
    rollapplyr(. , width = 3, FUN = rsd)

The output of this code is not what I acutually want.

Does anyone know how to do multiple regression by group in a rolling window? Thanks in advance, Giselle

Thank Grothendieck and Marcus for your codes! It really helped me a lot:) I now appened them here:

# Grothendieck method
rsd <- function(df){
  lm(formula = y~x1+x2, data = as.data.frame(df), na.action = na.omit) %>%
    resid() %>%
    return()
}

width <- 5
df_m2 <-
  df %>% 
  group_by(Group) %>%
  group_modify(~ {
    cbind(., rollapplyr(.[c("y", "x1", "x2")], width, rsd, fill = NA,
                        by.column = FALSE))
  }) %>%
  ungroup %>%
  select(c("Group","Date","5")) %>%
  dplyr::rename(residual_m2 = "5")
# Marcus method
output <- data.frame()
for (i in unique(df$Group)) {
  a = df%>% subset(Group==i)
  a[,"residual"] = NA
  max = nrow(a)
  if(max<5){
    next
  }
  for (j in seq(5,max,by=1)) {
    b = a %>% slice((j-4):j)
    lm_ = lm(y~x1+x2, data = b)
    a[j,]$residual = residuals(lm_)[5]
  }
  output <-
    output %>%
    rbind(a)
}

A good old-fashioned for-loop here could be:

for (i in unique(df$Group)){
  for (j in (seq(15000,15012, 3))){
      lm_ <- lm(formula = df[df$Group== i & df$Date %in% c(j, j+1, j+2), 3] ~ df[df$Group== i & df$Date %in% c(j, j+1, j+2), 4] + df[df$Group== i & df$Date %in% c(j, j+1, j+2), 5], na.action = na.omit)
      print(paste('Group', i, 'Dates from', j, 'to', j+3, residuals(lm_)))
  }
}

Use group_modify and use rollapplyr with the by.column = FALSE argument so that rsd is applied to all columns at once rather than one at a time.

Note that if you use width 3 with two predictors and an intercept the residuals will necessarily be all zero so we changed the width to 5.

library(dplyr, exclude = c("lag", "filter"))
library(zoo)

width <- 5

df %>% 
  group_by(Group) %>%
  group_modify(~ {
      cbind(., rollapplyr(.[c("y", "x1", "x2")], width, rsd, fill = NA,
          by.column = FALSE))
  }) %>%
  ungroup

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM