简体   繁体   中英

Is there a way to loop through column names (not numbers) in r for linear models?

I have a data sheet with 40 data columns (40 different nutrients), with additional columns for plot numbers and factors. I would like to automatically loop through each column name and produce a linear model and summary for each. The data columns begin at column 10.

for(i in 10:ncol(df)) {       # for-loop over columns
  mod2<-aov(i~block+tillage*residue+Error(subblock),data=df)
  summary(mod2)
}

This is currently producing the error Error in model.frame.default(formula = i ~ subblock, data = df, drop.unused.levels = TRUE): variable lengths differ (found for 'subblock') Variable lengths are consistent so I imagine I am looping incorrectly.

The data looks similar to below (with more categorical columns at the start), with the nutrient columns beginning at column 10.

block tillage residue subblock nutrient 1 nutrient 2 etc.
b1 NT NR s1 0.5 0.6

In general it is helpful to post a sample of your data using dput() . In the absence of that I am going to use the built in dataset mtcars to show you how it is possible to do what you are doing with formula() :

head(mtcars)

#                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
# Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

# Select columns
desired_columns  <- names(mtcars)[!names(mtcars)=="mpg"]

for (column in desired_columns){
    this_formula = formula(paste("mpg ~ ", column))
    print(summary(lm(this_formula, data = mtcars)))
}

This will output lm(mpg ~ var) for each var in the data. The key is the paste() statement which builds the expression into a string, and then formula() makes it into a formula object Hopefully you can see how this can be applied to your data.

You do not need a loop. You can just pass a matrix to the LHS of the formula:

dep <- names(iris)[names(iris) != "Species"]
f <- as.formula(sprintf("cbind(%s) ~ Species", paste(dep, collapse = ",")))

summary(lm(f, data = iris))

Here a simple base solution:

model <- list()
model_summary <- list()
for(i in 10:ncol(df)) {       # for-loop over columns
  col <- colnames(df)[i]
  formula <- as.formula(paste0(col,"~block+tillage*residue+Error(subblock)"))
  model[[i-9]] <-aov(formula,data=df)
   model_summary [[i-9]]<-summary(model[[i-9]])
}

Just create a new formula at each iteration using the name of the i-column

EDIT

As suggested in the comment by @Ben Bolker you can achieve the same results with reformulate in a clearer and simpler way, by changing

formula <- as.formula(paste0(col,"~block+tillage*residue+Error(subblock)"))

in

formula <- reformulate(response=col,"block+tillage*residue+Error(subblock)")

If you want the statistics in a table (which might come in handy) you can use the purrr and broom packages. Here's an example using the dataset mtcars :

Code

library(tidyr)
library(purrr)
library(broom)

formula <- lapply(colnames(mtcars)[3:ncol(mtcars)], function(x) as.formula(paste0(x, " ~ cyl")))

names(formula) <- format(formula)

table <- formula %>% map(~aov(.x, mtcars)) %>% map_dfr(tidy, .id="model")

Output

> head(table)
# A tibble: 6 x 7
  model      term         df     sumsq     meansq statistic   p.value
  <chr>      <chr>     <dbl>     <dbl>      <dbl>     <dbl>     <dbl>
1 disp ~ cyl cyl           1 387454.   387454.        131.   1.80e-12
2 disp ~ cyl Residuals    30  88731.     2958.         NA   NA       
3 hp ~ cyl   cyl           1 100984.   100984.         67.7  3.48e- 9
4 hp ~ cyl   Residuals    30  44743.     1491.         NA   NA       
5 drat ~ cyl cyl           1      4.34      4.34       28.8  8.24e- 6
6 drat ~ cyl Residuals    30      4.52      0.151      NA   NA    

Try

formula <- lapply(colnames(df)[10:ncol(df)], function(x) as.formula(paste0(x, " ~ block + tillage * residue + Error(subblock)")))

names(formula) <- format(formula)

table <- formula %>% map(~aov(.x, df)) %>% map_dfr(tidy, .id="model")   

Purrr solution:

Without a MWE it is difficult to help you. My approach would be to split your dataset into one dependent and one independent variable dataset. Then put each dependent variable into a list and append the independent dataset. Then you can "loop" through each list and apply the regression you like.

df <- mtcars

df_independent <- df %>%
  as_tibble() %>%
  # select independent variables
  select(9:10)

df_dependent <- df %>%
  as_tibble() %>%
  # select all dependent variables and store each column in a list
  select(1:8) %>%
  as.list() %>%
  map(as_tibble) %>%
  map(~ cbind(.x, df_independent))


df_dependent %>%
 # df_independent %>% colnames() %>% paste0(".x$",., collapse ="+")
  map(~ lm(.x$value ~ .x$am + .x$gear)) %>%
  map(summary)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM