简体   繁体   English

用于多元线性回归的循环

[英]Loop for multiple linear regression

Hi I'm starting to use r and am stuck on analyzing my data.嗨,我开始使用 r 并坚持分析我的数据。 I have a dataframe that has 80 columns.我有一个有 80 列的 dataframe。 Column 1 is the dependent variable and from column 2 to 80 they are the independent variables.第 1 列是因变量,第 2 列到第 80 列是自变量。 I want to perform 78 multiple linear regressions leaving the first independent variable of the model fixed (column 2) and create a list where I can to save all regressions to later be able to compare the models using AIC scores.我想执行 78 个多元线性回归,将 model 的第一个自变量固定(第 2 列)并创建一个列表,我可以在其中保存所有回归,以便以后能够使用 AIC 分数比较模型。 how can i do it?我该怎么做?

Here is my loop这是我的循环

data.frame

for(i in 2:80)

{
Regressions <- lm(data.frame$column1 ~ data.frame$column2 + data.frame [,i])  
}

Using the iris dataset as an example you can do:iris数据集为例,您可以执行以下操作:

lapply(seq_along(iris)[-c(1:2)], function(x) lm(data = iris[,c(1:2, x)]))

[[1]]

Call:
lm(data = iris[, c(1:2, x)])

Coefficients:
 (Intercept)   Sepal.Width  Petal.Length  
      2.2491        0.5955        0.4719  


[[2]]

Call:
lm(data = iris[, c(1:2, x)])

Coefficients:
(Intercept)  Sepal.Width  Petal.Width  
     3.4573       0.3991       0.9721  


[[3]]

Call:
lm(data = iris[, c(1:2, x)])

Coefficients:
      (Intercept)        Sepal.Width  Speciesversicolor   Speciesvirginica  
           2.2514             0.8036             1.4587             1.9468  

This works because when you pass a dataframe to lm() without a formula it applies the function DF2formula() under the hood which treats the first column as the response and all other columns as predictors.这是有效的,因为当您在没有公式的情况下将 dataframe 传递给lm()时,它会在引擎盖下应用 function DF2formula() ,它将第一列视为响应,将所有其他列视为预测变量。

With the for loop we can initialize a list to store the output使用for循环,我们可以初始化一个list来存储 output

nm1 <- names(df1)[2:80]
Regressions <- vector('list', length(nm1))
for(i in seq_along(Regressions)) {
   Regressions[[i]] <- lm(reformulate(c("column2", nm1[i]), "column1"), data = df1)
  }

Or use paste instead of reformulate或使用paste而不是reformulate

for(i in seq_along(Regressions)) {
   Regressions[[i]] <- lm(as.formula(paste0("column1 ~ column2 + ", 
                                nm1[i])), data = df1)
  }

Using a reproducible example使用可重现的示例

nm2 <- names(iris)[3:5]
Regressions2 <- vector('list', length(nm2))
for(i in seq_along(Regressions2)) {
    Regressions2[[i]] <- lm(reformulate(c("Sepal.Width", nm2[i]), "Sepal.Length"), data = iris)
 }



Regressions2[[1]]

#Call:
#lm(formula = reformulate(c("Sepal.Width", nm2[i]), "Sepal.Length"), 
#    data = iris)

#Coefficients:
# (Intercept)   Sepal.Width  Petal.Length  
#      2.2491        0.5955        0.4719  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM