简体   繁体   中英

Save iterations of for loop in R

I'm working on a project where I need to collect the intercept, slope, and R squared of several linear regressions. Since I need to at least 200 samples of different sample sizes I set-up the code below, but it only saves the last iteration of the loop. Any suggestions on how I can record each loop so that I can have all of the coefficients and r-squares that I require.

for (i in 1:5) {
  x <- as.data.frame(mydf[sample(1:1000,25,replace=FALSE),])
  mylm <- lm(spd66305~spd66561, data=x) 
  coefs <- rbind(lman(mylm))
  total.coefs <- rbind(coefs)
}
total.coefs

The function used in the loop is below if that is needed.

lman <- function(mylm){
  r2 <- summary(mylm)$r.squared
  r <- sqrt(r2)
  intercept <- coef(mylm)[1]
  slope <- coef(mylm)[2]
  tbl <- c(intercept,slope,r2,r)
}

Thanks for the help.

Before starting your loop, you can write total.coefs <- data.frame() , to initialise an empty data.frame. Then in your loop you want to update the total.coefs, as follows: total.coefs <- rbind(total.coefs, coefs) . Finally replace the last line in lman by: tbl <- data.frame(intercept=intercept, slope=slope, r2=r2, r=r) .

Here's how I'd do it, for example on the mtcars data. Note: It's not advisable to use rbind inside the loop if you're building a data structure. You can call rbind after the looping has been done and things are much less stressful. I prefer to do this type of operation with a list.

Here I wrapped my lapply loop with rbind , and then do.call binds the list elements together recursively. Another thing to note is that I take the samples prior to entering the loop. This makes debugging easier and can be more efficient overall

reps <- replicate(3, sample(nrow(mtcars), 5), simplify = FALSE)
do.call(rbind, lapply(reps, function(x) {
    mod <- lm(mpg ~ hp, mtcars[x,])
    c(coef(mod), R = summary(mod)$r.squared)
}))
#      (Intercept)          hp         R
# [1,]    33.29360 -0.08467169 0.5246208
# [2,]    29.97636 -0.06043852 0.4770310
# [3,]    28.33462 -0.05113847 0.8514720

The following transposed vapply loop produces the same result, and is often faster when you know the type of result you expect

t(vapply(reps, function(x) {
    mod <- lm(mpg ~ hp, mtcars[x,])
    c(coef(mod), R = summary(mod)$r.squared)
}, numeric(3))) 

Another way to record each loop would be to make the work reproducible and keep your datasets around in case you have extreme values, missing values, new questions about the datasets, or other surprises that need investigated.

This is a similar case using the iris dataset.

# create sample data
data(iris)
iris <- iris[ ,c('Sepal.Length','Petal.Length')]

# your function with data.frame fix on last line
lman <- function(mylm){
  r2 <- summary(mylm)$r.squared
  r <- sqrt(r2)
  intercept <- coef(mylm)[1]
  slope <- coef(mylm)[2]
  data.frame(intercept,slope,r2,r)
}

# set seed to make reproducible
set.seed(3)

# create all datasets
alldatasets <- lapply(1:200,function(x,df){
  df[sample(1:nrow(df),size = 50,replace = F), ]
},df = iris)

# create all models based on alldatasets
allmodels <- lapply(alldatasets,lm,formula = Sepal.Length ~ Petal.Length)

# run custom function on all models
lmanresult <- lapply(allmodels,lman)

# format results
result <- do.call('rbind',lmanresult)
row.names(result) <- NULL

# inspect the 129th sample, model, and result
alldatasets[[129]]
summary(allmodels[[129]])
result[129, ]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM