简体   繁体   中英

rolling stepwise regression in R

I have a dataframe of 12 predictors and a list of numbers called BEI(which I want to predict). I want to run stepwise selection on every 12 rows of data, for example 1:12, 2:13 and etc. For each rolling, I want to return the coefficients and use the coefficients to predict BEI. Below is my code:

k = length(BEI)
coef.list <- numeric()
predicted.list <- numeric()
for(i in 1:(k-11)){
  BEI.subset <- BEI[i:(i+11)]
  predictors.subset <- predictors[c(i:(i+11)),]
  fit.stepwise <- regsubsets(BEI.subset~., data = predictors.subset, nvmax = 10, method = "forward")
  fit.summary <- summary(fit.stepwise)
  id <- which.min(fit.summary$cp)
  coefficients <- coef(fit.stepwise,id)
  coef.list <- append(coef.list, coefficients)
  form <- as.formula(fit.stepwise$call[[2]])
  mat <- model.matrix(form,predictors.subset)
  predicted.stepwise <- mat[,names(coefficients)]%*%coefficients
  predicted.list <- append(predicted.list, predicted.stepwise)
}

and I got the errors like this: Reordering variables and trying again: There were 50 or more warnings (use warnings() to see the first 50)

the warnings are: 1: In leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, ... : 1 linear dependencies found 2: In leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, ... : 1 linear dependencies found 3: In leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, ... : 1 linear dependencies found .... etc.

How do I fix this? Or is this a better way to write the codes?

The reason why you run into the error is due to missing values (NA) for rolling data subsets.

Using data(swiss) as an example:

dim(swiss) 
# [1] 47  6

split_swiss <- lapply(1:nrow(swiss), function(x) swiss[x:(x+11),])
length(split_swiss)
# [1] 47  ## rolling subset produce 47 data.frames. 

lapply(tail(split_swiss), head) # show the first 6 rows of the last 6 data.frames 
[[1]]
             Fertility Agriculture Examination Education Catholic Infant.Mortality
Neuchatel         64.4        17.6          35        32    16.92             23.0
Val de Ruz        77.6        37.6          15         7     4.97             20.0
ValdeTravers      67.6        18.7          25         7     8.65             19.5
V. De Geneve      35.0         1.2          37        53    42.34             18.0
Rive Droite       44.7        46.6          16        29    50.43             18.2
Rive Gauche       42.8        27.7          22        29    58.33             19.3

[[2]]
             Fertility Agriculture Examination Education Catholic Infant.Mortality
Val de Ruz        77.6        37.6          15         7     4.97             20.0
ValdeTravers      67.6        18.7          25         7     8.65             19.5
V. De Geneve      35.0         1.2          37        53    42.34             18.0
Rive Droite       44.7        46.6          16        29    50.43             18.2
Rive Gauche       42.8        27.7          22        29    58.33             19.3
NA                  NA          NA          NA        NA       NA               NA

[[3]]
             Fertility Agriculture Examination Education Catholic Infant.Mortality
ValdeTravers      67.6        18.7          25         7     8.65             19.5
V. De Geneve      35.0         1.2          37        53    42.34             18.0
Rive Droite       44.7        46.6          16        29    50.43             18.2
Rive Gauche       42.8        27.7          22        29    58.33             19.3
NA                  NA          NA          NA        NA       NA               NA
NA.1                NA          NA          NA        NA       NA               NA

[[4]]
             Fertility Agriculture Examination Education Catholic Infant.Mortality
V. De Geneve      35.0         1.2          37        53    42.34             18.0
Rive Droite       44.7        46.6          16        29    50.43             18.2
Rive Gauche       42.8        27.7          22        29    58.33             19.3
NA                  NA          NA          NA        NA       NA               NA
NA.1                NA          NA          NA        NA       NA               NA
NA.2                NA          NA          NA        NA       NA               NA

[[5]]
             Fertility Agriculture Examination Education Catholic Infant.Mortality
Rive Droite      44.7        46.6          16        29    50.43             18.2
Rive Gauche      42.8        27.7          22        29    58.33             19.3
NA                 NA          NA          NA        NA       NA               NA
NA.1               NA          NA          NA        NA       NA               NA
NA.2               NA          NA          NA        NA       NA               NA
NA.3               NA          NA          NA        NA       NA               NA

[[6]]
            Fertility Agriculture Examination Education Catholic Infant.Mortality
Rive Gauche      42.8        27.7          22        29    58.33             19.3
NA                 NA          NA          NA        NA       NA               NA
NA.1               NA          NA          NA        NA       NA               NA
NA.2               NA          NA          NA        NA       NA               NA
NA.3               NA          NA          NA        NA       NA               NA
NA.4               NA          NA          NA        NA       NA               NA

An error would follow if you were to run regsubsets with these data.frames where there are more predictors than cases.

lapply(split_swiss, function(x) regsubsets(Fertility ~., data=x, nvmax=10, method="forward"))

 Error in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in = force.in,  : 
  y and x different lengths In addition: Warning messages:
1: In leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in = force.in,  :
  1  linear dependencies found
 ......

Instead, I can retain only subsets with 12 rows and continue with the regression as so:

split_swiss_2 <- split_swiss[sapply(lapply(split_swiss, na.omit), nrow) == 12]
lapply(split_swiss_2, function(x) regsubsets(Fertility ~., data=x, nvmax=10, method="forward"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM