![](/img/trans.png)
[英]How to run a regression with a single explanatory variable and multiple dependent variables?
[英]How to run linear regression on a dataset, taking each time a single variable as the dependent variable?
我有一个数据集,其中包含所有称为“ dt”的数字变量。要把每个单个变量作为因变量,并使用逐步回归法来找到剩余预测变量的最佳组合。如果结果“最佳组合”给出了调整后的结果, R ^ 2> 0.70,将其输出到控制台。这是我天真的尝试。
for(i in ncol(dt)){
nul<-lm(dt[,i]~1,data=dt)
ful<-lm(dt[,i]~.,data=dt)
model<-step(nul,scope = list(lower=nul,upper=ful),direction="forward",trace=FALSE)
if((summary(lm(as.formula(model$call),data=dt)))$adj.r.squared>0.70){
print(as.formula(model$call))
cat(paste("\n"))
}
}
这是我得到的不想要的输出:
dt[, i] ~ Y
Warning messages:
1: attempting model selection on an essentially perfect fit is nonsense
2: In summary.lm(lm(as.formula(model$call), data = dt)) :
essentially perfect fit: summary may be unreliable
正如@ 42-正确指出的那样,您将获得统计上的“垃圾”。
但是,如果您仍然坚持要“测试”它,则使用jumps :: regsubsets获得多个线性mod的r ^ 2相当容易。
library(leaps)
a <- regsubsets(as.matrix(x=swiss[,-1]),y=swiss[,1], nvmax=1, nbest=100, intercept=F, method="exhaustive", really.big=T)
summary(a)
Subset selection object
5 Variables
Forced in Forced out
Examination FALSE FALSE
Education FALSE FALSE
Catholic FALSE FALSE
Infant.Mortality FALSE FALSE
100 subsets of each size up to 1
Selection Algorithm: exhaustive
Agriculture Examination Education Catholic Infant.Mortality
1 ( 1 ) " " " " " " " " "*"
1 ( 2 ) "*" " " " " " " " "
1 ( 3 ) " " "*" " " " " " "
1 ( 4 ) " " " " " " "*" " "
1 ( 5 ) " " " " "*" " " " "
在上面的示例中,以“生育力”为因变量的5 lm mod,每个剩余变量作为每个模型的单个预测变量,例如,生育力〜婴儿,生育力〜农业等。
summary(a)$rsq # returns R^2 for each of the five models
[1] 0.9703145 0.8558076 0.7054873 0.5660736 0.4474043
通过将以上功能更改为:
nonsense_lm <- function(data, x) regsubsets(as.matrix(x=data[,-x]),y=data[,x], nvmax=1, nbest=100, intercept=F, method="exhaustive", really.big=T)
然后循环每个变量作为预测变量:
nonsense <- lapply(1:ncol(swiss), function(x) nonsense_lm(swiss, x))
lapply(nonsense, function(x)summary(x)$rsq)
[[1]]
[1] 0.9703145 0.8558076 0.7054873 0.5660736 0.4474043
[[2]]
[1] 0.8558076 0.8121654 0.5785572 0.4961365 0.2715248
[[3]]
[1] 0.7844437 0.7729180 0.7054873 0.4961365 0.2132834
[[4]]
[1] 0.7729180 0.5456765 0.4474043 0.2715248 0.2137402
[[5]]
[1] 0.5785572 0.5660736 0.5135628 0.2137402 0.2132834
[[6]]
[1] 0.9703145 0.8121654 0.7844437 0.5456765 0.5135628
同样,请注意,R ^ 2是有效的统计“垃圾”。 进行适当的问题测试是任何分析的最关键步骤。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.