I want to make an rolling stepwise regression with dplyr , do() and rollapply() . My code for the data looks like this:
FUND_DATA <- tibble(
DATE = 1:10,
FUND1 = rnorm(10),
FUND2 = rnorm(10),
FUND3 = rnorm(10),
FUND4 = rnorm(10))
These are just sime price quates from funds for period 1-10. For the independet variables it looks the same:
FACTORS <- tibble(
DATE = 1:10,
x1 = rnorm(10),
x2 = rnorm(10),
x3 = rnorm(10),
x4 = rnorm(10))
Now I make merge the the two tibbles from above as following:
REG_DATA <- FUND_DATA %>%
pivot_longer(contains("FUND"), names_to = "FUND",
values_to = "PRICE") %>% arrange(FUND,DATE) %>% left_join(., FACTORS, by = "DATE") %>%
group_by(FUND) %>% mutate(RET = PRICE/lag(PRICE)-1) %>% drop_na()
So I have some long tibble and grouped by the FUND.
A tibble: 36 x 8
# Groups: FUND [4]
DATE FUND PRICE x1 x2 x3 x4 RET
<int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2 FUND1 -1.19 -0.422 -0.872 -0.292 -0.176 -2.04
2 3 FUND1 -0.869 1.60 0.247 -0.610 0.170 -0.272
3 4 FUND1 -1.60 0.159 -0.757 0.730 -0.154 0.839
4 5 FUND1 -1.58 -0.688 -0.718 0.778 0.879 -0.0103
5 6 FUND1 1.14 -0.00190 -0.956 1.14 -0.953 -1.72
6 7 FUND1 -0.452 0.730 -0.344 0.925 -0.593 -1.40
7 8 FUND1 -0.809 0.895 -0.987 -0.0791 -0.0133 0.792
8 9 FUND1 1.06 -0.503 1.06 1.96 0.362 -2.31
9 10 FUND1 0.0358 0.359 -0.370 1.27 0.129 -0.966
10 2 FUND2 -0.525 -0.422 -0.872 -0.292 -0.176 -0.229
# ... with 26 more rows
On this data I want to perform a rolling stepwise regression for each fund and store the R^2 for each rolling window and fund. So for each window ths should perform a stepwise regression. I came up with the folling code:
ROLLING <- REG_DATA %>% group_by(FUND) %>% do(R2 = rollapply(., width = 2, function(x){
summary(step(lm(RET ~ x1+x2+x3+x4,
data = .), direction = "both", trace = 0))$r.squared
},by.column = FALSE,align = "right"))
The code is running without errors but the output is the problem. This code only stores the R^2 from the last rolling window (period 8-10) and overwrite the others I think, so it looks like this:
FUND1 c(0.675, 0.675, 0.675,...)
FUND2 c(0.447, 0.447, 0.447,...)
FUND3 .....
Can you guys help me so that the codes stores the R^2 for each window?
I have one possible solution for your task though it does not use do() nor step(). The approach is to separate the FUNDS in individual list items, convert it to a daily timeseries and work from there:
library(dplyr)
library(tidyr)
library(zoo)
library(purrr)
library(plyr)
# your dummy data
FUND_DATA <- tibble(
DATE = 1:10,
FUND1 = rnorm(10),
FUND2 = rnorm(10),
FUND3 = rnorm(10),
FUND4 = rnorm(10))
# your dummy data
FACTORS <- tibble(
DATE = 1:10,
x1 = rnorm(10),
x2 = rnorm(10),
x3 = rnorm(10),
x4 = rnorm(10))
# first part of your code (had to split it to use it later for naming)
REG_DATA <- FUND_DATA %>%
tidyr::pivot_longer(contains("FUND"), names_to = "FUND",
values_to = "PRICE") %>%
dplyr::arrange(FUND,DATE) %>%
dplyr::left_join(., FACTORS, by = "DATE")
# make it o a list of timeseries
lts <- REG_DATA %>%
# core data of timeseries is a matrix and allows only one data type (we prefer numeric thus cut "FUND" and preserve only the number)
dplyr::mutate(FUND = as.numeric(substr(FUND, 5, 5))) %>%
group_by(FUND) %>%
mutate(RET = PRICE/lag(PRICE)-1) %>%
drop_na() %>%
# split by groups into list items
dplyr::group_split() %>%
# convert each list item to a time series with starting date and length according to each list item
purrr::map( ~ xts::xts(.x, order.by = seq(as.Date("2020-01-01"), as.Date("2020-01-01") + length(.x), by = 1)))
# map the rollapply to the timeseries and extract R² => !!! width should be larger than 2 because you have 4 explanatory variables (6 seems to be the minimum)
res <- purrr::map(lts, ~ rollapply(.x,width = 6,
FUN = function(x)
summary(lm(RET ~ x1+x2+x3+x4, data = as.data.frame(x)))$r.squared,
by.column = FALSE, align = "right"))
# deconstruct the time series to a data.frame (there might be a better way)
res2 <- purrr::map(res, ~ data.frame(TS = zoo::index(.x),
R2 = zoo::coredata(.x)))
# get the unqiue FUND names and assing as list item names (you could use a vector instead)
names(res2) <- unique(REG_DATA$FUND)
# condense the list items to a data.frame using the before assinged names as a row
plyr::ldply(res2)
.id TS R2
1 FUND1 2020-01-01 NA
2 FUND1 2020-01-02 NA
3 FUND1 2020-01-03 NA
4 FUND1 2020-01-04 NA
5 FUND1 2020-01-05 NA
6 FUND1 2020-01-06 0.3556052
7 FUND1 2020-01-07 0.7670353
8 FUND1 2020-01-08 0.9077215
9 FUND1 2020-01-09 0.9758644
10 FUND2 2020-01-01 NA
11 FUND2 2020-01-02 NA
12 FUND2 2020-01-03 NA
13 FUND2 2020-01-04 NA
14 FUND2 2020-01-05 NA
15 FUND2 2020-01-06 0.8021993
16 FUND2 2020-01-07 0.8755639
17 FUND2 2020-01-08 0.8206098
18 FUND2 2020-01-09 0.8296576
19 FUND3 2020-01-01 NA
20 FUND3 2020-01-02 NA
21 FUND3 2020-01-03 NA
22 FUND3 2020-01-04 NA
23 FUND3 2020-01-05 NA
24 FUND3 2020-01-06 0.4545569
25 FUND3 2020-01-07 0.4172101
26 FUND3 2020-01-08 0.3604151
27 FUND3 2020-01-09 0.9877962
28 FUND4 2020-01-01 NA
29 FUND4 2020-01-02 NA
30 FUND4 2020-01-03 NA
31 FUND4 2020-01-04 NA
32 FUND4 2020-01-05 NA
33 FUND4 2020-01-06 0.9541878
34 FUND4 2020-01-07 0.9973588
35 FUND4 2020-01-08 0.9991080
36 FUND4 2020-01-09 0.9965382
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.