简体   繁体   中英

How to perform linear regression between one column and all other columns in a data frame and save r squared values in new data frame?

I am trying to test how similar the data from one column (CS.1) in my data frame is to the rest of the columns in the data frame (allree). The data frame has 283 columns, and the first one contains labels for the observations. I attempted setting up a for loop to perform the linear regression and save the r-squared value along with the column name in a new data frame. However, I keep receiving errors abut the data frame for the results being the incorrect length.

This is the code:

#this is the data frame
allree<-read.csv("All REE 2.csv")

#creating the data frame for the results
cs1 <- data.frame(row = 1:280)
dat <- data.frame(rsq = 1:3, samp = 1:3)

#trying to test each column against the second column (CS.1) and save the r-squared values
for(x in 3:283){
  na.rm=TRUE
  reg<-lm(CS.1~allree[,x], data=allree)
  rsq<-summary(reg)$r.squared
  dat$r2[x] <- rsq
  dat$sample[x] <- colnames(allree)[x]
  if(x==3) cs1<-dat
  if(x>3)cs1<-rbind(cs1, dat)
  }

This is the error:

Error in `$<-.data.frame`(`*tmp*`, "r2", value = c(NA, NA, 0.180399384405891, : replacement has 4 rows, data has 3

Do I need to break the original data into multiple data frames? I would like to repeat this test for a couple other columns if I can figure it out this way.

Since you did not provide a reproducible example, I will do it using the mtcars dataframe.

Instead of using a for loop, I will use functions form purrr , broom and dplyr packages.

Data

This data frame comes by default in R

glimse(mtcars)
Observations: 32
Variables: 11
$ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, …
$ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, …
$ disp <dbl> 160.0, 160.0, 108.0, 258…
$ hp   <dbl> 110, 110, 93, 110, 175, …
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, …
$ wt   <dbl> 2.620, 2.875, 2.320, 3.2…
$ qsec <dbl> 16.46, 17.02, 18.61, 19.…
$ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, …
$ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, …
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, …
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, …

Code

# Map function from purrr allow you to do a loop
purrr::map(

  # set the elements to iterate over.
  # in this case all variables except 
  # the first (mpg)
  mtcars[,-1],

  # The second argument is the body of the loop. 
  # An lm call with the formula as follow.
  # here the .x is replace by a new variable 
  # in each iteration
  ~lm(mpg ~ .x, data = mtcars)
  ) %>%

  # Then summarise each output with broom::glance
  purrr::map(broom::glance) %>%

  # bind all summary
  dplyr::bind_rows(.id = "variable") %>%

  # selecting the variables of interest
  dplyr::select(variable, r.squared)

# A tibble: 10 x 2
   variable r.squared
   <chr>        <dbl>
 1 cyl          0.726
 2 disp         0.718
 3 hp           0.602
 4 drat         0.464
 5 wt           0.753
 6 qsec         0.175
 7 vs           0.441
 8 am           0.360
 9 gear         0.231
10 carb         0.304

上一个答案中的错字应该是

glimpse(mtcars)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM