如何在数据集上运行线性回归，每次都将一个变量作为因变量？

Question

我有一个数据集，其中包含所有称为“ dt”的数字变量。要把每个单个变量作为因变量，并使用逐步回归法来找到剩余预测变量的最佳组合。如果结果“最佳组合”给出了调整后的结果， R ^ 2> 0.70，将其输出到控制台。这是我天真的尝试。

for(i in ncol(dt)){
    nul<-lm(dt[,i]~1,data=dt)
    ful<-lm(dt[,i]~.,data=dt)
    model<-step(nul,scope = list(lower=nul,upper=ful),direction="forward",trace=FALSE)
    if((summary(lm(as.formula(model$call),data=dt)))$adj.r.squared>0.70){
        print(as.formula(model$call))
        cat(paste("\n"))
    }
}

这是我得到的不想要的输出：

dt[, i] ~ Y

Warning messages:
1: attempting model selection on an essentially perfect fit is nonsense 
2: In summary.lm(lm(as.formula(model$call), data = dt)) :
essentially perfect fit: summary may be unreliable

Answer 1

正如@ 42-正确指出的那样，您将获得统计上的“垃圾”。

但是，如果您仍然坚持要“测试”它，则使用jumps :: regsubsets获得多个线性mod的r ^ 2相当容易。

library(leaps)
a <- regsubsets(as.matrix(x=swiss[,-1]),y=swiss[,1], nvmax=1, nbest=100, intercept=F, method="exhaustive", really.big=T)
summary(a) 

Subset selection object
5 Variables 
                 Forced in Forced out
Examination          FALSE      FALSE
Education            FALSE      FALSE
Catholic             FALSE      FALSE
Infant.Mortality     FALSE      FALSE
100 subsets of each size up to 1
Selection Algorithm: exhaustive
         Agriculture Examination Education Catholic Infant.Mortality
1  ( 1 ) " "         " "         " "       " "      "*"             
1  ( 2 ) "*"         " "         " "       " "      " "             
1  ( 3 ) " "         "*"         " "       " "      " "             
1  ( 4 ) " "         " "         " "       "*"      " "             
1  ( 5 ) " "         " "         "*"       " "      " "

在上面的示例中，以“生育力”为因变量的5 lm mod，每个剩余变量作为每个模型的单个预测变量，例如，生育力〜婴儿，生育力〜农业等。

summary(a)$rsq # returns R^2 for each of the five models

[1] 0.9703145 0.8558076 0.7054873 0.5660736 0.4474043

通过将以上功能更改为：

nonsense_lm <- function(data, x) regsubsets(as.matrix(x=data[,-x]),y=data[,x], nvmax=1, nbest=100, intercept=F, method="exhaustive", really.big=T)

然后循环每个变量作为预测变量：

nonsense <- lapply(1:ncol(swiss), function(x) nonsense_lm(swiss, x))
lapply(nonsense, function(x)summary(x)$rsq)

 [[1]]
 [1] 0.9703145 0.8558076 0.7054873 0.5660736 0.4474043

 [[2]]
 [1] 0.8558076 0.8121654 0.5785572 0.4961365 0.2715248

 [[3]]
 [1] 0.7844437 0.7729180 0.7054873 0.4961365 0.2132834

 [[4]]
 [1] 0.7729180 0.5456765 0.4474043 0.2715248 0.2137402

 [[5]]
 [1] 0.5785572 0.5660736 0.5135628 0.2137402 0.2132834

 [[6]]
 [1] 0.9703145 0.8121654 0.7844437 0.5456765 0.5135628

同样，请注意，R ^ 2是有效的统计“垃圾”。 进行适当的问题测试是任何分析的最关键步骤。

如何在数据集上运行线性回归，每次都将一个变量作为因变量？

问题描述

1 个解决方案

解决方案1
1 2016-06-17 07:33:09

如何在数据集上运行线性回归，每次都将一个变量作为因变量？

问题描述

1 个解决方案

解决方案1 1 2016-06-17 07:33:09

解决方案1
1 2016-06-17 07:33:09