简体   繁体   English

R循环的变量名运行线性回归模型

[英]R Loop for Variable Names to run linear regression model

First off, I am pretty new to this so my method/thinking may be wrong, I have imported a xlsx data set into a data frame using R and R studio. 首先,我对此很陌生,因此我的方法/想法可能是错误的,我已经使用R和R studio将xlsx数据集导入到数据框中。 I want to be able to loop through the column names to get all of the variables with exactly " 10 " in them in order to run a simple linear regression. 我希望能够遍历列名以获取所有变量,其中所有变量的准确值为“ 10 ”,以便运行简单的线性回归。 So here's my code: 所以这是我的代码:

indx <- grepl('_10_', colnames(data)) #list returns all of the true values in the data set
col10 <- names(data[indx]) #this gives me the names of the columns I want

Here is the for loop I have which returns an error: 这是我有的for循环,返回错误:

temp <- c()
for(i in 1:length(col10)){
   temp = col10[[i]]
  lm.test <- lm(Total_Transactions ~ temp[[i]], data = data)
  print(temp) #actually prints out the right column names
  i + 1
}

Is it even possible to run a loop to place those variables in the linear regression model? 甚至可以运行一个循环来将这些变量放入线性回归模型中? The error I am getting is: "Error in model.frame.default(formula = Total_Transactions ~ temp[[i]], : variable lengths differ (found for 'temp[[i]]')". If anyone could point me in the right direction I would be very grateful. Thanks. 我得到的错误是:“ model.frame.default中的错误(公式= Total_Transactions〜temp [[i]] ,:可变长度不同(为'temp [[i]]'找到))”。如果有人可以指点我朝着正确的方向前进,我将非常感谢。

Ok, I'll post an answer. 好的,我将发布答案。 I will use the dataset mtcars as an example. 我将以数据集mtcars为例。 I believe it will work with your dataset. 我相信它将与您的数据集一起使用。
First, I create a store, lm.test , an object of class list . 首先,我创建一个商店lm.test ,一个类list的对象。 In your code you are assigning the output of lm(.) every time through the loop and in the end you would only have the last one, all others would have been rewriten by the newer ones. 在您的代码中,每次循环时都要分配lm(.)的输出,最后您将只有最后一个输出,所有其他输出都将由较新的输出重写。
Then, inside the loop, I use function reformulate to put together the regression formula. 然后,在循环内部,我使用函数reformulate来组合回归公式。 There are other ways of doing this but this one is simple. 还有其他方法可以做到这一点,但这很简单。

# Use just some columns
data <- mtcars[, c("mpg", "cyl", "disp", "hp", "drat", "wt")]
col10 <- names(data)[-1]

lm.test <- vector("list", length(col10))

for(i in seq_along(col10)){
    lm.test[[i]] <- lm(reformulate(col10[i], "mpg"), data = data)
}

lm.test

Now you can use the results list for all sorts of things. 现在,您可以将结果列表用于所有事情。 I suggest you start using lapply and friends for that. 我建议您开始lapply使用lapply和朋友。
For instance, to extract the coefficients: 例如,要提取系数:

cfs <- lapply(lm.test, coef)

In order to get the summaries: 为了获得摘要:

smry <- lapply(lm.test, summary)

It becomes very simple once you're familiar with *apply functions. 熟悉*apply函数后,它变得非常简单。

You can create a temporary subset in which you select only the columns used in your regression. 您可以创建一个临时子集,在其中仅选择回归中使用的列。 This way, you won't need to inject the temporary name in the formula. 这样,您无需在公式中插入临时名称。

Sticking up to your code, this should do the trick. 坚持您的代码,这应该可以解决问题。

for(i in 1:length(col10)){
 tempSubset <- data[,c("Total_Transactions", col10[i]]
 lm.test <- lm(Total_Transactions ~ ., data = tempSubset)
 i + 1
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM