简体   繁体   中英

Rolling regression with expanding window

I am new to R and I am trying to run rolling regressions with an expanding window (that is for each date t use data up to t), with two independent variables in a data frame grouped by a categorical column.

For example, in the data frame below, I would like to extract coefficients of lm(return ~ regress1 + regress 2) grouped by category K using all rows until the row of interest. Thus for row 2, data set for regression will be rows 1:2, for row 3 will be rows 1:3, for row 4 will be just row 4 as it is the first row with categorical variable K = B.

myinput <- data.frame(K = c("A", "A", "A", "B", "B", "B", "C", "C", "C"), 
                      date = c(1:3) , return = rnorm(9), regress1 = rnorm(9), regress2 = rnorm(9))

I found a very useful thread on this topic here: Rolling regression with expanding window in R , but I am having a difficult time applying it to my data set.

If anyone could help me understand how I need to adapt the approach they used it'd be very much appreciated. Thanks.

Using myinput shown reproducibly in the Note at the end, define a function reg to perform the regression. Then use rollapplyr with a width argument equal to date making use of the fact that date is 1, 2, 3, etc. within group and so equals the number of rows to regress over. Finally cbind the result back to the original data frame.

library(zoo)

reg <- function(x) coef(lm(as.data.frame(x)))

r <- rollapplyr(zoo(myinput[3:5]), myinput$date, reg, by.column=FALSE, coredata=FALSE)
cbind(myinput, coef = coredata(r))

giving:

  K date      return   regress1   regress2 coef.(Intercept) coef.regress1 coef.regress2
1 A    1 -0.56047565 -0.4456620  0.7013559      -0.56047565            NA            NA
2 A    2 -0.23017749  1.2240818 -0.4727914      -0.47231761     0.1978137            NA
3 A    3  1.55870831  0.3598138 -1.0678237       0.15985654    -0.9479906    -1.6294374
4 B    1  0.07050839  0.4007715 -0.2179749       0.07050839            NA            NA
5 B    2  0.12928774  0.1106827 -1.0260044       0.15171486    -0.2026254            NA
6 B    3  1.71506499 -0.5558411 -0.7288912       1.05050327    -2.0789081     0.6735997
7 C    1  0.46091621  1.7869131 -0.6250393       0.46091621            NA            NA
8 C    2 -1.26506123  0.4978505 -1.6866933      -1.93165311     1.3389399            NA
9 C    3 -0.68685285 -1.9666172  0.8377870      -0.14625482     0.6376389     0.8515213

Note

set.seed must be used prior to using random data in order to make the result reproducible. We used this:

set.seed(123)
myinput <- data.frame(K = c("A", "A", "A", "B", "B", "B", "C", "C", "C"), 
  date = 1:3, return = rnorm(9), regress1 = rnorm(9), regress2 = rnorm(9))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM