简体   繁体   English

在 R 中循环多个“多重线性回归”

[英]Loop multiple 'multiple linear regressions' in R

I have a database where I want to do several multiple regressions.我有一个数据库,我想在其中进行多次多元回归。 They all look like this:它们都是这样的:

fit <- lm(Variable1 ~ Age + Speed + Gender + Mass, data=Data)

The only variable changing is variable1.唯一改变的变量是 variable1。 Now I want to loop or use something from the apply family to loop several variables at the place of variable1.现在我想循环或使用 apply 系列中的一些东西在 variable1 的位置循环几个变量。 These variables are columns in my datafile.这些变量是我的数据文件中的列。 Can someone help me to solve this problem?有人可以帮我解决这个问题吗? Many thanks!非常感谢!

what I tried so far:到目前为止我尝试过的:

When I extract one of the column names with the names() function I do get a the name of the column:当我使用 names() 函数提取列名之一时,我确实得到了列名:

varname  = as.name(names(Data[14])) 

But when I fill this in (and I used the attach() function):但是当我填写这个时(并且我使用了attach()函数):

fit <- lm(Varname ~ Age + Speed + Gender + Mass, data=Data) 

I get the following error:我收到以下错误:

Error in model.frame.default(formula = Varname ~ Age + Speed + Gender + : object is not a matrix model.frame.default 错误(公式 = Varname ~ Age + Speed + Gender +:对象不是矩阵

I suppose that the lm() function does not recognize Varname as Variable1.我想 lm() 函数不能将 Varname 识别为 Variable1。

You can use lapply to loop over your variables.您可以使用lapply来循环您的变量。

fit <- lapply(Data[,c(...)], function(x) lm(x ~ Age + Speed + Gender + Mass, data = Data))

This gives you a list of your results.这为您提供了结果列表。

The c(...) should contain your variable names as strings. c(...)应该包含您的变量名称作为字符串。 Alternatively, you can choose the variables by their position in Data , like Data[,1:5] .或者,您可以通过变量在Data的位置选择变量,例如Data[,1:5]

The problem in your case is that the formula in the lm function attempts to read the literal names of columns in the data or feed the whole vector into the regression.您的情况的问题是lm函数中的公式试图读取data中列的字面名称或将整个向量输入到回归中。 Therefore, to use the column name, you need to tell the formula to interpret the value of the variable varnames and incorporate it with the other variables.因此,要使用列名,您需要告诉公式解释变量varnames的值并将其与其他变量合并。

# generate some data
set.seed(123)
Data <- data.frame(x = rnorm(30), y = rnorm(30), 
    Age = sample(0:90, 30), Speed = rnorm(30, 60, 10), 
    Gender = sample(c("W", "M"), 30, rep=T), Mass = rnorm(30))
varnames <- names(Data)[1:2]

# fit regressions for multiple dependent variables 
fit <- lapply(varnames, 
    FUN=function(x) lm(formula(paste(x, "~Age+Speed+Gender+Mass")), data=Data))
names(fit) <- varnames

 fit
$x

Call:
lm(formula = formula(paste(x, "~Age+Speed+Gender+Mass")), data = Data)

Coefficients:
(Intercept)          Age        Speed      GenderW         Mass  
   0.135423     0.010013    -0.010413     0.023480     0.006939  


$y

Call:
lm(formula = formula(paste(x, "~Age+Speed+Gender+Mass")), data = Data)

Coefficients:
(Intercept)          Age        Speed      GenderW         Mass  
   2.232269    -0.008035    -0.027147    -0.044456    -0.023895  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM