I have a dataframe of 12 predictors and a list of numbers called BEI(which I want to predict). I want to run stepwise selection on every 12 rows of data, for example 1:12, 2:13 and etc. For each rolling, I want to return the coefficients and use the coefficients to predict BEI. Below is my code:
k = length(BEI)
coef.list <- numeric()
predicted.list <- numeric()
for(i in 1:(k-11)){
BEI.subset <- BEI[i:(i+11)]
predictors.subset <- predictors[c(i:(i+11)),]
fit.stepwise <- regsubsets(BEI.subset~., data = predictors.subset, nvmax = 10, method = "forward")
fit.summary <- summary(fit.stepwise)
id <- which.min(fit.summary$cp)
coefficients <- coef(fit.stepwise,id)
coef.list <- append(coef.list, coefficients)
form <- as.formula(fit.stepwise$call[[2]])
mat <- model.matrix(form,predictors.subset)
predicted.stepwise <- mat[,names(coefficients)]%*%coefficients
predicted.list <- append(predicted.list, predicted.stepwise)
}
and I got the errors like this: Reordering variables and trying again: There were 50 or more warnings (use warnings() to see the first 50)
the warnings are: 1: In leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, ... : 1 linear dependencies found 2: In leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, ... : 1 linear dependencies found 3: In leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, ... : 1 linear dependencies found .... etc.
How do I fix this? Or is this a better way to write the codes?
The reason why you run into the error is due to missing values (NA) for rolling data subsets.
Using data(swiss) as an example:
dim(swiss)
# [1] 47 6
split_swiss <- lapply(1:nrow(swiss), function(x) swiss[x:(x+11),])
length(split_swiss)
# [1] 47 ## rolling subset produce 47 data.frames.
lapply(tail(split_swiss), head) # show the first 6 rows of the last 6 data.frames
[[1]]
Fertility Agriculture Examination Education Catholic Infant.Mortality
Neuchatel 64.4 17.6 35 32 16.92 23.0
Val de Ruz 77.6 37.6 15 7 4.97 20.0
ValdeTravers 67.6 18.7 25 7 8.65 19.5
V. De Geneve 35.0 1.2 37 53 42.34 18.0
Rive Droite 44.7 46.6 16 29 50.43 18.2
Rive Gauche 42.8 27.7 22 29 58.33 19.3
[[2]]
Fertility Agriculture Examination Education Catholic Infant.Mortality
Val de Ruz 77.6 37.6 15 7 4.97 20.0
ValdeTravers 67.6 18.7 25 7 8.65 19.5
V. De Geneve 35.0 1.2 37 53 42.34 18.0
Rive Droite 44.7 46.6 16 29 50.43 18.2
Rive Gauche 42.8 27.7 22 29 58.33 19.3
NA NA NA NA NA NA NA
[[3]]
Fertility Agriculture Examination Education Catholic Infant.Mortality
ValdeTravers 67.6 18.7 25 7 8.65 19.5
V. De Geneve 35.0 1.2 37 53 42.34 18.0
Rive Droite 44.7 46.6 16 29 50.43 18.2
Rive Gauche 42.8 27.7 22 29 58.33 19.3
NA NA NA NA NA NA NA
NA.1 NA NA NA NA NA NA
[[4]]
Fertility Agriculture Examination Education Catholic Infant.Mortality
V. De Geneve 35.0 1.2 37 53 42.34 18.0
Rive Droite 44.7 46.6 16 29 50.43 18.2
Rive Gauche 42.8 27.7 22 29 58.33 19.3
NA NA NA NA NA NA NA
NA.1 NA NA NA NA NA NA
NA.2 NA NA NA NA NA NA
[[5]]
Fertility Agriculture Examination Education Catholic Infant.Mortality
Rive Droite 44.7 46.6 16 29 50.43 18.2
Rive Gauche 42.8 27.7 22 29 58.33 19.3
NA NA NA NA NA NA NA
NA.1 NA NA NA NA NA NA
NA.2 NA NA NA NA NA NA
NA.3 NA NA NA NA NA NA
[[6]]
Fertility Agriculture Examination Education Catholic Infant.Mortality
Rive Gauche 42.8 27.7 22 29 58.33 19.3
NA NA NA NA NA NA NA
NA.1 NA NA NA NA NA NA
NA.2 NA NA NA NA NA NA
NA.3 NA NA NA NA NA NA
NA.4 NA NA NA NA NA NA
An error would follow if you were to run regsubsets with these data.frames where there are more predictors than cases.
lapply(split_swiss, function(x) regsubsets(Fertility ~., data=x, nvmax=10, method="forward"))
Error in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in = force.in, :
y and x different lengths In addition: Warning messages:
1: In leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in = force.in, :
1 linear dependencies found
......
Instead, I can retain only subsets with 12 rows and continue with the regression as so:
split_swiss_2 <- split_swiss[sapply(lapply(split_swiss, na.omit), nrow) == 12]
lapply(split_swiss_2, function(x) regsubsets(Fertility ~., data=x, nvmax=10, method="forward"))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.