I am trying to test how similar the data from one column (CS.1) in my data frame is to the rest of the columns in the data frame (allree). The data frame has 283 columns, and the first one contains labels for the observations. I attempted setting up a for loop to perform the linear regression and save the r-squared value along with the column name in a new data frame. However, I keep receiving errors abut the data frame for the results being the incorrect length.
This is the code:
#this is the data frame
allree<-read.csv("All REE 2.csv")
#creating the data frame for the results
cs1 <- data.frame(row = 1:280)
dat <- data.frame(rsq = 1:3, samp = 1:3)
#trying to test each column against the second column (CS.1) and save the r-squared values
for(x in 3:283){
na.rm=TRUE
reg<-lm(CS.1~allree[,x], data=allree)
rsq<-summary(reg)$r.squared
dat$r2[x] <- rsq
dat$sample[x] <- colnames(allree)[x]
if(x==3) cs1<-dat
if(x>3)cs1<-rbind(cs1, dat)
}
This is the error:
Error in `$<-.data.frame`(`*tmp*`, "r2", value = c(NA, NA, 0.180399384405891, : replacement has 4 rows, data has 3
Do I need to break the original data into multiple data frames? I would like to repeat this test for a couple other columns if I can figure it out this way.
Since you did not provide a reproducible example, I will do it using the mtcars
dataframe.
Instead of using a for loop, I will use functions form purrr
, broom
and dplyr
packages.
Data
This data frame comes by default in R
glimse(mtcars)
Observations: 32
Variables: 11
$ mpg <dbl> 21.0, 21.0, 22.8, 21.4, …
$ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, …
$ disp <dbl> 160.0, 160.0, 108.0, 258…
$ hp <dbl> 110, 110, 93, 110, 175, …
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, …
$ wt <dbl> 2.620, 2.875, 2.320, 3.2…
$ qsec <dbl> 16.46, 17.02, 18.61, 19.…
$ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, …
$ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, …
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, …
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, …
Code
# Map function from purrr allow you to do a loop
purrr::map(
# set the elements to iterate over.
# in this case all variables except
# the first (mpg)
mtcars[,-1],
# The second argument is the body of the loop.
# An lm call with the formula as follow.
# here the .x is replace by a new variable
# in each iteration
~lm(mpg ~ .x, data = mtcars)
) %>%
# Then summarise each output with broom::glance
purrr::map(broom::glance) %>%
# bind all summary
dplyr::bind_rows(.id = "variable") %>%
# selecting the variables of interest
dplyr::select(variable, r.squared)
# A tibble: 10 x 2
variable r.squared
<chr> <dbl>
1 cyl 0.726
2 disp 0.718
3 hp 0.602
4 drat 0.464
5 wt 0.753
6 qsec 0.175
7 vs 0.441
8 am 0.360
9 gear 0.231
10 carb 0.304
上一个答案中的错字应该是
glimpse(mtcars)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.