I am new to R and I am trying to run rolling regressions with an expanding window (that is for each date t use data up to t), with two independent variables in a data frame grouped by a categorical column.
For example, in the data frame below, I would like to extract coefficients of lm(return ~ regress1 + regress 2) grouped by category K using all rows until the row of interest. Thus for row 2, data set for regression will be rows 1:2, for row 3 will be rows 1:3, for row 4 will be just row 4 as it is the first row with categorical variable K = B.
myinput <- data.frame(K = c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
date = c(1:3) , return = rnorm(9), regress1 = rnorm(9), regress2 = rnorm(9))
I found a very useful thread on this topic here: Rolling regression with expanding window in R , but I am having a difficult time applying it to my data set.
If anyone could help me understand how I need to adapt the approach they used it'd be very much appreciated. Thanks.
Using myinput
shown reproducibly in the Note at the end, define a function reg
to perform the regression. Then use rollapplyr
with a width
argument equal to date
making use of the fact that date
is 1, 2, 3, etc. within group and so equals the number of rows to regress over. Finally cbind
the result back to the original data frame.
library(zoo)
reg <- function(x) coef(lm(as.data.frame(x)))
r <- rollapplyr(zoo(myinput[3:5]), myinput$date, reg, by.column=FALSE, coredata=FALSE)
cbind(myinput, coef = coredata(r))
giving:
K date return regress1 regress2 coef.(Intercept) coef.regress1 coef.regress2
1 A 1 -0.56047565 -0.4456620 0.7013559 -0.56047565 NA NA
2 A 2 -0.23017749 1.2240818 -0.4727914 -0.47231761 0.1978137 NA
3 A 3 1.55870831 0.3598138 -1.0678237 0.15985654 -0.9479906 -1.6294374
4 B 1 0.07050839 0.4007715 -0.2179749 0.07050839 NA NA
5 B 2 0.12928774 0.1106827 -1.0260044 0.15171486 -0.2026254 NA
6 B 3 1.71506499 -0.5558411 -0.7288912 1.05050327 -2.0789081 0.6735997
7 C 1 0.46091621 1.7869131 -0.6250393 0.46091621 NA NA
8 C 2 -1.26506123 0.4978505 -1.6866933 -1.93165311 1.3389399 NA
9 C 3 -0.68685285 -1.9666172 0.8377870 -0.14625482 0.6376389 0.8515213
set.seed
must be used prior to using random data in order to make the result reproducible. We used this:
set.seed(123)
myinput <- data.frame(K = c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
date = 1:3, return = rnorm(9), regress1 = rnorm(9), regress2 = rnorm(9))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.