簡體   English   中英

R中的滾動逐步回歸

[英]rolling stepwise regression in R

我有一個包含12個預測變量的數據框和一個稱為BEI(我要預測)的數字列表。 我想對每12行數據(例如1:12、2:13等)進行逐步選擇。對於每次滾動,我都希望返回系數並使用系數來預測BEI。 下面是我的代碼:

k = length(BEI)
coef.list <- numeric()
predicted.list <- numeric()
for(i in 1:(k-11)){
  BEI.subset <- BEI[i:(i+11)]
  predictors.subset <- predictors[c(i:(i+11)),]
  fit.stepwise <- regsubsets(BEI.subset~., data = predictors.subset, nvmax = 10, method = "forward")
  fit.summary <- summary(fit.stepwise)
  id <- which.min(fit.summary$cp)
  coefficients <- coef(fit.stepwise,id)
  coef.list <- append(coef.list, coefficients)
  form <- as.formula(fit.stepwise$call[[2]])
  mat <- model.matrix(form,predictors.subset)
  predicted.stepwise <- mat[,names(coefficients)]%*%coefficients
  predicted.list <- append(predicted.list, predicted.stepwise)
}

我得到了這樣的錯誤:重新排列變量並重試:有50個或更多警告(使用warnings()查看前50個)

警告如下:1:在jumps.setup(x,y,wt = wt,nbest = nbest,nvmax = nvmax,...:1找到線性依賴關系2:在jumps.setup(x,y,wt = wt, nbest = nbest,nvmax = nvmax,...:找到1個線性依賴關系3:在jumps.setup(x,y,wt = wt,nbest = nbest,nvmax = nvmax,...:找到1個線性依賴關系...等

我該如何解決? 還是這是編寫代碼的更好方法?

出現錯誤的原因是滾動數據子集缺少值(NA)。

以數據(瑞士)為例:

dim(swiss) 
# [1] 47  6

split_swiss <- lapply(1:nrow(swiss), function(x) swiss[x:(x+11),])
length(split_swiss)
# [1] 47  ## rolling subset produce 47 data.frames. 

lapply(tail(split_swiss), head) # show the first 6 rows of the last 6 data.frames 
[[1]]
             Fertility Agriculture Examination Education Catholic Infant.Mortality
Neuchatel         64.4        17.6          35        32    16.92             23.0
Val de Ruz        77.6        37.6          15         7     4.97             20.0
ValdeTravers      67.6        18.7          25         7     8.65             19.5
V. De Geneve      35.0         1.2          37        53    42.34             18.0
Rive Droite       44.7        46.6          16        29    50.43             18.2
Rive Gauche       42.8        27.7          22        29    58.33             19.3

[[2]]
             Fertility Agriculture Examination Education Catholic Infant.Mortality
Val de Ruz        77.6        37.6          15         7     4.97             20.0
ValdeTravers      67.6        18.7          25         7     8.65             19.5
V. De Geneve      35.0         1.2          37        53    42.34             18.0
Rive Droite       44.7        46.6          16        29    50.43             18.2
Rive Gauche       42.8        27.7          22        29    58.33             19.3
NA                  NA          NA          NA        NA       NA               NA

[[3]]
             Fertility Agriculture Examination Education Catholic Infant.Mortality
ValdeTravers      67.6        18.7          25         7     8.65             19.5
V. De Geneve      35.0         1.2          37        53    42.34             18.0
Rive Droite       44.7        46.6          16        29    50.43             18.2
Rive Gauche       42.8        27.7          22        29    58.33             19.3
NA                  NA          NA          NA        NA       NA               NA
NA.1                NA          NA          NA        NA       NA               NA

[[4]]
             Fertility Agriculture Examination Education Catholic Infant.Mortality
V. De Geneve      35.0         1.2          37        53    42.34             18.0
Rive Droite       44.7        46.6          16        29    50.43             18.2
Rive Gauche       42.8        27.7          22        29    58.33             19.3
NA                  NA          NA          NA        NA       NA               NA
NA.1                NA          NA          NA        NA       NA               NA
NA.2                NA          NA          NA        NA       NA               NA

[[5]]
             Fertility Agriculture Examination Education Catholic Infant.Mortality
Rive Droite      44.7        46.6          16        29    50.43             18.2
Rive Gauche      42.8        27.7          22        29    58.33             19.3
NA                 NA          NA          NA        NA       NA               NA
NA.1               NA          NA          NA        NA       NA               NA
NA.2               NA          NA          NA        NA       NA               NA
NA.3               NA          NA          NA        NA       NA               NA

[[6]]
            Fertility Agriculture Examination Education Catholic Infant.Mortality
Rive Gauche      42.8        27.7          22        29    58.33             19.3
NA                 NA          NA          NA        NA       NA               NA
NA.1               NA          NA          NA        NA       NA               NA
NA.2               NA          NA          NA        NA       NA               NA
NA.3               NA          NA          NA        NA       NA               NA
NA.4               NA          NA          NA        NA       NA               NA

如果使用這些data.frame運行regsubsets的情況下出現錯誤,那么預測變量的數量將大於案例的數量。

lapply(split_swiss, function(x) regsubsets(Fertility ~., data=x, nvmax=10, method="forward"))

 Error in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in = force.in,  : 
  y and x different lengths In addition: Warning messages:
1: In leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in = force.in,  :
  1  linear dependencies found
 ......

相反,我只能保留12行的子集,並繼續進行回歸:

split_swiss_2 <- split_swiss[sapply(lapply(split_swiss, na.omit), nrow) == 12]
lapply(split_swiss_2, function(x) regsubsets(Fertility ~., data=x, nvmax=10, method="forward"))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM