来自多个多元回归输出的汇总数据框

Question

I am doing multiple OLS regressions.我正在做多个 OLS 回归。 I have used the following lm function:我使用了以下 lm 函数：

GroupNetReturnsStockPickers <- read.csv("GroupNetReturnsStockPickers.csv", header=TRUE, sep=",", dec=".")
ModelGroupNetReturnsStockPickers <- lm(StockPickersNet ~ Mkt.RF+SMB+HML+WML, data=GroupNetReturnsStockPickers)
names(GroupNetReturnsStockPickers)
summary(ModelGroupNetReturnsStockPickers)

Which gives me the summary output of:这给了我以下的摘要输出：

    Call:
  lm(formula = StockPickersNet ~ Mkt.RF + SMB + HML + WML, data = GroupNetReturnsStockPickers)

Residuals:
  Min        1Q    Median        3Q       Max 
-0.029698 -0.005069 -0.000328  0.004546  0.041948 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  4.655e-05  5.981e-04   0.078    0.938
Mkt.RF      -1.713e-03  1.202e-02  -0.142    0.887
SMB          3.006e-02  2.545e-02   1.181    0.239
HML          1.970e-02  2.350e-02   0.838    0.403
WML          1.107e-02  1.444e-02   0.766    0.444

Residual standard error: 0.009029 on 251 degrees of freedom
Multiple R-squared:  0.01033,   Adjusted R-squared:  -0.005445 
F-statistic: 0.6548 on 4 and 251 DF,  p-value: 0.624

This is perfect.太棒了。 However, I am doing a total of 10 multiple OLS regressions, and I wish to create my own summary output, in a data frame, where I extract the Intercept Estimate, the tvalue estimate, and the p-value, for all 10 analyzes individually.但是，我总共进行了 10 次多重 OLS 回归，我希望在数据框中创建自己的汇总输出，在其中分别提取所有 10 次分析的截距估计值、t 值估计值和 p 值. Hence it would be a 10x3, where the columns names would be Model1, Model2,..,Model10, and row names: Value, t-value and p-Value.因此它将是一个 10x3，其中列名称将是 Model1、Model2、..、Model10，行名称：Value、t-value 和 p-Value。

I appreciate any help.我很感激任何帮助。

Answer 1

There's a few packages that do this (stargazer and texreg) as well as this code for outreg .有几个包可以做到这一点（stargazer 和 texreg）以及outreg 的代码。

In any case, if you are only interested in the intercept here is one approach:无论如何，如果您只对拦截感兴趣，这里是一种方法：

# Estimate a bunch of different models, stored in a list
fits <- list() # Create empty list to store models
fits$model1 <- lm(Ozone ~ Solar.R, data = airquality)
fits$model2 <- lm(Ozone ~ Solar.R + Wind, data = airquality)
fits$model3 <- lm(Ozone ~ Solar.R + Wind + Temp, data = airquality)

# Combine the results for the intercept
do.call(cbind, lapply(fits, function(z) summary(z)$coefficients["(Intercept)", ]))


# RESULT:
#                  model1       model2        model3
# Estimate   18.598727772 7.724604e+01 -64.342078929
# Std. Error  6.747904163 9.067507e+00  23.054724347
# t value     2.756222869 8.518995e+00  -2.790841389
# Pr(>|t|)    0.006856021 1.052118e-13   0.006226638

Answer 2

Look at the broom package, which was created to do exactly what you are asking for.看看broom包，它是为完全满足您的要求而创建的。 The only difference is that it puts the models into rows and the different statistics into columns, and I understand that you would prefer the opposite, but you can work around that afterwards if it is really necessary.唯一的区别是它将模型放入行中，将不同的统计数据放入列中，我知道您更喜欢相反的情况，但如果确实有必要，您可以在之后解决这个问题。

To give you an example, the function tidy() converts a model output into a dataframe.举个例子，函数tidy()将模型输出转换为数据帧。

model <- lm(mpg ~ cyl, data=mtcars)
summary(model) 

Call:
lm(formula = mpg ~ cyl, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.9814 -2.1185  0.2217  1.0717  7.5186 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  37.8846     2.0738   18.27  < 2e-16 ***
cyl          -2.8758     0.3224   -8.92 6.11e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.206 on 30 degrees of freedom
Multiple R-squared:  0.7262,    Adjusted R-squared:  0.7171 
F-statistic: 79.56 on 1 and 30 DF,  p-value: 6.113e-10

And和

 library(broom)
 tidy(model)

yields the following data frame:产生以下数据框：

         term estimate std.error statistic      p.value
1 (Intercept) 37.88458 2.0738436 18.267808 8.369155e-18
2         cyl -2.87579 0.3224089 -8.919699 6.112687e-10

Look at ?tidy.lm to see more options, for instance for confidence intervals, etc.查看?tidy.lm以查看更多选项，例如置信区间等。

To combine the output of your ten models into one dataframe, you could use要将十个模型的输出组合成一个数据帧，您可以使用

library(dplyr)
bind_rows(one, two, three, ... , .id="models")

Or, if your different models come from regressions using the same dataframe, you can combine it with dplyr :或者，如果您的不同模型来自使用相同数据dplyr回归，您可以将其与dplyr结合使用：

models <- mtcars %>% group_by(gear) %>% do(data.frame(tidy(lm(mpg~cyl, data=.), conf.int=T)))

Source: local data frame [6 x 8]
Groups: gear

  gear        term  estimate std.error statistic      p.value  conf.low  conf.high
1    3 (Intercept) 29.783784 4.5468925  6.550360 1.852532e-05 19.960820 39.6067478
2    3         cyl -1.831757 0.6018987 -3.043297 9.420695e-03 -3.132080 -0.5314336
3    4 (Intercept) 41.275000 5.9927925  6.887440 4.259099e-05 27.922226 54.6277739
4    4         cyl -3.587500 1.2587382 -2.850076 1.724783e-02 -6.392144 -0.7828565
5    5 (Intercept) 40.580000 3.3238331 12.208796 1.183209e-03 30.002080 51.1579205
6    5         cyl -3.200000 0.5308798 -6.027730 9.153118e-03 -4.889496 -1.5105036

来自多个多元回归输出的汇总数据框

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-02-25 20:43:57

解决方案2
1 2016-02-25 21:37:12

来自多个多元回归输出的汇总数据框

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-02-25 20:43:57

解决方案2 1 2016-02-25 21:37:12

解决方案1
2 已采纳 2016-02-25 20:43:57

解决方案2
1 2016-02-25 21:37:12