将来自多个样本的回归摘要输出组合到 R 中的单个数据帧中

Question

I'm trying to combine multiple lm outputs into a data frame, for further calculations.我正在尝试将多个lm输出组合到一个数据框中，以进行进一步的计算。 I have a dataset of 1000 observations and 62 variables.我有一个包含 1000 个观察值和 62 个变量的数据集。 The project is to randomly split the dataset 63/37, train the model, repeat this 1000 times and save the coefficients, the fitted values, and the r2 for all 1000 runs.该项目是随机分割数据集 63/37，训练模型，重复 1000 次并保存所有 1000 次运行的系数、拟合值和 r2。 So I'm doing most of that here (using mtcars ):所以我在这里做大部分（使用mtcars ）：

data("mtcars")
f <- function () {
  fit <- lm(mpg ~ ., data = mtcars, subset = sample <- sample.int(n = nrow(mtcars), size = floor(.63*nrow(mtcars)), replace = F))
  coef(fit)
}
output <- t(replicate(1000, f()))

I know I can get the rsq values with summary(fit)$r.squared and I can use predict() to get the fitted values.我知道我可以使用summary(fit)$r.squared获得 rsq 值，并且我可以使用predict()来获得拟合值。 I'm just struggling with how to get them into the data frame with the saved coefficients.我只是在苦苦思索如何使用保存的系数将它们放入数据框中。

Answer 1

The below should do下面应该做

get_model <- function (input_data) {
    fit <- lm(mpg ~ ., 
              data = mtcars, 
              subset = sample <- sample.int(n = nrow(mtcars), 
                     size = floor(.63*nrow(mtcars)), replace = F)
             )      
    return(fit)
}

get_results <- function(lm_model){
    data <- data.frame()
    data <- rbind(data, coef(lm_model))
    data <- cbind(data, summary(lm_model)$r.squared)
    colnames(data) <- c(names(mtcars), "rsquared")
    return(data)
}


# running the above
input_data <- mtcars
general_df <- data.frame()

for(i in 1:1000){
    my_model   <- get_model(input_data)
    final_data <- get_results(my_model)
    general_df <- rbind(general_df, final_data)
}

Answer 2

You are very close:你非常接近：

library(tidyverse)
library(modelr)
data("mtcars")

get_data_lm <- function(data_df, testPCT = 0.37){

    data_resample <- modelr::crossv_mc(data_df, n = 1, test = testPCT)
    fit <- lm(mpg ~ ., data = as.data.frame(data_resample$train))

    stats <- c(coef(fit),
               "R2" = summary(fit)$r.squared,
               "AdjR2" = summary(fit)$adj.r.squared)
    pred_vals <- predict(fit, newdata = as.data.frame(data_resample$test))

    c(stats, pred_vals)

}

output <- t(replicate(1000, get_data_lm(mtcars)))

The only thing you needed to do is concatenate the other statistics and predicted values you want.您唯一需要做的就是连接您想要的其他统计数据和预测值。 Alternatively, you could use a parallel sapply() variant to make your simulation considerably faster.或者，您可以使用并行sapply()变体来显着加快模拟速度。

Another comment: I use the crossv_mc() function from the modelr:: package to create one testing and training partition.另一条评论：我使用来自modelr::包的crossv_mc()函数来创建一个测试和训练分区。 However, I could have used n = 1000 outside the function instead;但是，我本可以在函数外使用n = 1000来代替； this would have created a resample data frame in my working environment for me to apply() a function over.这将在我的工作环境中创建一个重采样数据框，供我apply()一个函数。 See the modelr:: GitHub page for more info.有关更多信息，请参阅modelr:: GitHub 页面。

将来自多个样本的回归摘要输出组合到 R 中的单个数据帧中

问题描述

2 个解决方案

解决方案1
0 2018-02-12 20:55:11

解决方案2
-1 已采纳 2018-02-12 22:44:02

将来自多个样本的回归摘要输出组合到 R 中的单个数据帧中

问题描述

2 个解决方案

解决方案1 0 2018-02-12 20:55:11

解决方案2 -1 已采纳 2018-02-12 22:44:02

解决方案1
0 2018-02-12 20:55:11

解决方案2
-1 已采纳 2018-02-12 22:44:02