简体   繁体   中英

Rolling stepwise regression with dplyr

I want to make an rolling stepwise regression with dplyr , do() and rollapply() . My code for the data looks like this:

    FUND_DATA <- tibble(
  DATE = 1:10,
  FUND1 = rnorm(10),
  FUND2 = rnorm(10),
  FUND3 = rnorm(10),
  FUND4 = rnorm(10))

These are just sime price quates from funds for period 1-10. For the independet variables it looks the same:

FACTORS <- tibble(
  DATE = 1:10,
  x1 = rnorm(10),
  x2 = rnorm(10),
  x3 = rnorm(10),
  x4 = rnorm(10))

Now I make merge the the two tibbles from above as following:

REG_DATA <- FUND_DATA %>%
  pivot_longer(contains("FUND"),  names_to = "FUND", 
  values_to = "PRICE") %>% arrange(FUND,DATE) %>% left_join(., FACTORS, by = "DATE") %>%  
  group_by(FUND) %>% mutate(RET = PRICE/lag(PRICE)-1) %>% drop_na()

So I have some long tibble and grouped by the FUND.

  A tibble: 36 x 8
# Groups:   FUND [4]
    DATE FUND    PRICE       x1     x2      x3      x4     RET
   <int> <chr>   <dbl>    <dbl>  <dbl>   <dbl>   <dbl>   <dbl>
 1     2 FUND1 -1.19   -0.422   -0.872 -0.292  -0.176  -2.04  
 2     3 FUND1 -0.869   1.60     0.247 -0.610   0.170  -0.272 
 3     4 FUND1 -1.60    0.159   -0.757  0.730  -0.154   0.839 
 4     5 FUND1 -1.58   -0.688   -0.718  0.778   0.879  -0.0103
 5     6 FUND1  1.14   -0.00190 -0.956  1.14   -0.953  -1.72  
 6     7 FUND1 -0.452   0.730   -0.344  0.925  -0.593  -1.40  
 7     8 FUND1 -0.809   0.895   -0.987 -0.0791 -0.0133  0.792 
 8     9 FUND1  1.06   -0.503    1.06   1.96    0.362  -2.31  
 9    10 FUND1  0.0358  0.359   -0.370  1.27    0.129  -0.966 
10     2 FUND2 -0.525  -0.422   -0.872 -0.292  -0.176  -0.229 
# ... with 26 more rows

On this data I want to perform a rolling stepwise regression for each fund and store the R^2 for each rolling window and fund. So for each window ths should perform a stepwise regression. I came up with the folling code:

ROLLING <- REG_DATA %>% group_by(FUND) %>% do(R2 = rollapply(., width = 2, function(x){
  summary(step(lm(RET ~ x1+x2+x3+x4, 
                  data = .), direction = "both", trace = 0))$r.squared
  },by.column = FALSE,align = "right"))

The code is running without errors but the output is the problem. This code only stores the R^2 from the last rolling window (period 8-10) and overwrite the others I think, so it looks like this:

FUND1   c(0.675, 0.675, 0.675,...)
FUND2   c(0.447, 0.447, 0.447,...)
FUND3   .....

Can you guys help me so that the codes stores the R^2 for each window?

I have one possible solution for your task though it does not use do() nor step(). The approach is to separate the FUNDS in individual list items, convert it to a daily timeseries and work from there:

library(dplyr)
library(tidyr)
library(zoo)
library(purrr)
library(plyr)

# your dummy data
FUND_DATA <- tibble(
  DATE = 1:10,
  FUND1 = rnorm(10),
  FUND2 = rnorm(10),
  FUND3 = rnorm(10),
  FUND4 = rnorm(10))
# your dummy data
FACTORS <- tibble(
  DATE = 1:10,
  x1 = rnorm(10),
  x2 = rnorm(10),
  x3 = rnorm(10),
  x4 = rnorm(10))

# first part of your code (had to split it to use it later for naming)
REG_DATA <- FUND_DATA %>%
  tidyr::pivot_longer(contains("FUND"),  names_to = "FUND",
                      values_to = "PRICE") %>%
  dplyr::arrange(FUND,DATE) %>% 
  dplyr::left_join(., FACTORS, by = "DATE") 

# make it o a list of timeseries
lts <-  REG_DATA %>%  
  # core data of timeseries is a matrix and allows only one data type (we prefer numeric thus cut "FUND" and preserve only the number)
  dplyr::mutate(FUND = as.numeric(substr(FUND, 5, 5))) %>% 
  group_by(FUND) %>% 
  mutate(RET = PRICE/lag(PRICE)-1) %>% 
  drop_na() %>%
  # split by groups into list items
  dplyr::group_split() %>% 
  # convert each list item to a time series with starting date and length according to each list item 
  purrr::map( ~ xts::xts(.x, order.by  = seq(as.Date("2020-01-01"), as.Date("2020-01-01") + length(.x), by = 1)))

# map the rollapply to the timeseries and extract R² => !!! width should be larger than 2 because you have 4 explanatory variables (6 seems to be the minimum) 
res <- purrr::map(lts, ~ rollapply(.x,width = 6, 
                  FUN = function(x) 
                  summary(lm(RET ~ x1+x2+x3+x4, data = as.data.frame(x)))$r.squared,
                  by.column = FALSE, align = "right"))

# deconstruct the time series to a data.frame (there might be a better way)
res2 <- purrr::map(res,  ~ data.frame(TS = zoo::index(.x),
                                      R2 = zoo::coredata(.x))) 

# get the unqiue FUND names and assing as list item names (you could use a vector instead)
names(res2) <- unique(REG_DATA$FUND)

# condense the list items to a data.frame using the before assinged names as a row
plyr::ldply(res2)


     .id         TS        R2
1  FUND1 2020-01-01        NA
2  FUND1 2020-01-02        NA
3  FUND1 2020-01-03        NA
4  FUND1 2020-01-04        NA
5  FUND1 2020-01-05        NA
6  FUND1 2020-01-06 0.3556052
7  FUND1 2020-01-07 0.7670353
8  FUND1 2020-01-08 0.9077215
9  FUND1 2020-01-09 0.9758644
10 FUND2 2020-01-01        NA
11 FUND2 2020-01-02        NA
12 FUND2 2020-01-03        NA
13 FUND2 2020-01-04        NA
14 FUND2 2020-01-05        NA
15 FUND2 2020-01-06 0.8021993
16 FUND2 2020-01-07 0.8755639
17 FUND2 2020-01-08 0.8206098
18 FUND2 2020-01-09 0.8296576
19 FUND3 2020-01-01        NA
20 FUND3 2020-01-02        NA
21 FUND3 2020-01-03        NA
22 FUND3 2020-01-04        NA
23 FUND3 2020-01-05        NA
24 FUND3 2020-01-06 0.4545569
25 FUND3 2020-01-07 0.4172101
26 FUND3 2020-01-08 0.3604151
27 FUND3 2020-01-09 0.9877962
28 FUND4 2020-01-01        NA
29 FUND4 2020-01-02        NA
30 FUND4 2020-01-03        NA
31 FUND4 2020-01-04        NA
32 FUND4 2020-01-05        NA
33 FUND4 2020-01-06 0.9541878
34 FUND4 2020-01-07 0.9973588
35 FUND4 2020-01-08 0.9991080
36 FUND4 2020-01-09 0.9965382

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM