简体   繁体   中英

Rolling Regression by Group

Hi I have a panel data set. I'd like to do a rolling window regression for each firm and extract the coefficient of the independent var. y is the dependent var and x is the independent var. Rolling window is 12. That is, the first regression uses row 1 to row 12 data, the second regression uses row 2 to row 13 data, etc. Rollapply is used.

Here is a question that has the exact same error that I encountered: Rolling by group in data.table R The lucky thing about that question is that it only takes one column but mine takes two columns for regression so I can't make the change accordingly to the recommended answer in that post. Here is another post that uses a for loop. My real data has more than 2 million observations so it is too slow: rolling regression with dplyr Can any one help?

My fake data set is as follows:

dt<-rep(c("AAA","BBB","CCC"),each=24)
dt<-as.data.frame(dt)
names(dt)[names(dt)=="dt"] <- "firm"
a<-c(20100131,20100228,20100331,20100430,20100531,20100630,20100731,20100831,20100930,20101031,20101130,20101231,20110131,20110228,20110331,20110430,20110531,20110630,20110731,20110831,20110930,20111031,20111130,20111231)
dt$time<-rep(a,3)
dt<-dt%>% group_by(firm)%>%
  mutate(y=rnorm(24,10,5))
dt<-dt%>% group_by(firm)%>%
  mutate(x=rnorm(24,5,2))
dt<-as.data.table(dt)

I tried this code:

# create rolling regression function
    roll <- function(Z) 
{ 
  t = lm(formula=y~x, data = as.data.frame(Z), na.rm=T); 
  return(t$coef[2]) 
}
dt[,beta := rollapply(dt, width=12, roll, fill=NA, by.column=FALSE, align="right") , by=firm]

I am trying to create a column called "beta" that shows the coefficient of var x. So for each firm, the first data should kick in from the 12th observation.

It looks like the regression takes x and y from the 1st row for different groups and the coefficients seems a bit off compared to the result I got from EXCEL.

The second method I tried is the dplyr version:

dt %>%
group_by(firm) %>%
mutate(dt,beta = rollapply(dt,12,function(x) coef(lm(y~x,data=as.data.frame(x)))[2],by.column= FALSE, fill = NA, align = "right"))

It gives me the same issue. each group has the same number. Looks like for each firm, the regression takes y and x from the 1st row.

Any thoughts? Thank you so much.

Here is a solution that uses the rollRegres package and data.table package. I have also added a modified version of the OP's solution which works (see eddi's comment) and used an example with 2 million observations as the OP mentions

#####
# setup data
library(rollRegres)
library(data.table)
library(dplyr)

set.seed(33700919)
n_firms <- 83334 # yields ~ the 2M firm as the OP mentions
dt <- rep(1:n_firms, each = 24)
dt <- data.frame(firm = dt)
a <-c(20100131,20100228,20100331,20100430,20100531,20100630,20100731,20100831,20100930,20101031,20101130,20101231,20110131,20110228,20110331,20110430,20110531,20110630,20110731,20110831,20110930,20111031,20111130,20111231)
dt$time <- rep(a, n_firms)
dt <- dt %>% group_by(firm) %>% mutate(y=rnorm(24,10,5))
dt <- dt %>% group_by(firm) %>% mutate(x=rnorm(24,5,2))
dt <- as.data.table(dt)
nrow(dt) # roughly the 2M rows that the OP mentions
#R [1] 2000016

#####
# fit models
setkey(dt, firm, time) # make sure data is sorted correctly
start_time <- Sys.time() # to show computation time
dt[
  , beta :=
    roll_regres.fit(x = cbind(1, .SD[["x"]]), y = .SD[["y"]],
                    width = 12L)$coefs[, 2],
  by = firm]
Sys.time() - start_time
#R Time difference of 6.526595 secs

# gives the same as OP's solution with minor corrections
library(zoo)
start_time <- Sys.time()
roll <- function(Z)
  lm.fit(x = cbind(1, Z[, "x"]), y = Z[, "y"])$coef[2]
dt[
  , beta_zoo :=
    rollapply(.SD, width=12, roll, fill=NA, by.column=FALSE, align="right"),
  by=firm]
Sys.time() - start_time # much slower
#R Time difference of 1.87341 mins

# gives the same
all.equal(dt$beta, dt$beta_zoo)
#R [1] TRUE

Maybe you can try to change the first argument in rollapply, replace dt to column, dt[, c("y","x")] . See if it works

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM