简体   繁体   English

来自多个多元回归输出的汇总数据框

[英]summary dataframe from several multiple regression outputs

I am doing multiple OLS regressions.我正在做多个 OLS 回归。 I have used the following lm function:我使用了以下 lm 函数:

GroupNetReturnsStockPickers <- read.csv("GroupNetReturnsStockPickers.csv", header=TRUE, sep=",", dec=".")
ModelGroupNetReturnsStockPickers <- lm(StockPickersNet ~ Mkt.RF+SMB+HML+WML, data=GroupNetReturnsStockPickers)
names(GroupNetReturnsStockPickers)
summary(ModelGroupNetReturnsStockPickers)

Which gives me the summary output of:这给了我以下的摘要输出:

    Call:
  lm(formula = StockPickersNet ~ Mkt.RF + SMB + HML + WML, data = GroupNetReturnsStockPickers)

Residuals:
  Min        1Q    Median        3Q       Max 
-0.029698 -0.005069 -0.000328  0.004546  0.041948 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  4.655e-05  5.981e-04   0.078    0.938
Mkt.RF      -1.713e-03  1.202e-02  -0.142    0.887
SMB          3.006e-02  2.545e-02   1.181    0.239
HML          1.970e-02  2.350e-02   0.838    0.403
WML          1.107e-02  1.444e-02   0.766    0.444

Residual standard error: 0.009029 on 251 degrees of freedom
Multiple R-squared:  0.01033,   Adjusted R-squared:  -0.005445 
F-statistic: 0.6548 on 4 and 251 DF,  p-value: 0.624

This is perfect.太棒了。 However, I am doing a total of 10 multiple OLS regressions, and I wish to create my own summary output, in a data frame, where I extract the Intercept Estimate, the tvalue estimate, and the p-value, for all 10 analyzes individually.但是,我总共进行了 10 次多重 OLS 回归,我希望在数据框中创建自己的汇总输出,在其中分别提取所有 10 次分析的截距估计值、t 值估计值和 p 值. Hence it would be a 10x3, where the columns names would be Model1, Model2,..,Model10, and row names: Value, t-value and p-Value.因此它将是一个 10x3,其中列名称将是 Model1、Model2、..、Model10,行名称:Value、t-value 和 p-Value。

I appreciate any help.我很感激任何帮助。

There's a few packages that do this (stargazer and texreg) as well as this code for outreg .有几个包可以做到这一点(stargazer 和 texreg)以及outreg 的代码。

In any case, if you are only interested in the intercept here is one approach:无论如何,如果您只对拦截感兴趣,这里是一种方法:

# Estimate a bunch of different models, stored in a list
fits <- list() # Create empty list to store models
fits$model1 <- lm(Ozone ~ Solar.R, data = airquality)
fits$model2 <- lm(Ozone ~ Solar.R + Wind, data = airquality)
fits$model3 <- lm(Ozone ~ Solar.R + Wind + Temp, data = airquality)

# Combine the results for the intercept
do.call(cbind, lapply(fits, function(z) summary(z)$coefficients["(Intercept)", ]))


# RESULT:
#                  model1       model2        model3
# Estimate   18.598727772 7.724604e+01 -64.342078929
# Std. Error  6.747904163 9.067507e+00  23.054724347
# t value     2.756222869 8.518995e+00  -2.790841389
# Pr(>|t|)    0.006856021 1.052118e-13   0.006226638

Look at the broom package, which was created to do exactly what you are asking for.看看broom包,它是为完全满足您的要求而创建的。 The only difference is that it puts the models into rows and the different statistics into columns, and I understand that you would prefer the opposite, but you can work around that afterwards if it is really necessary.唯一的区别是它将模型放入行中,将不同的统计数据放入列中,我知道您更喜欢相反的情况,但如果确实有必要,您可以在之后解决这个问题。

To give you an example, the function tidy() converts a model output into a dataframe.举个例子,函数tidy()将模型输出转换为数据帧。

model <- lm(mpg ~ cyl, data=mtcars)
summary(model) 

Call:
lm(formula = mpg ~ cyl, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.9814 -2.1185  0.2217  1.0717  7.5186 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  37.8846     2.0738   18.27  < 2e-16 ***
cyl          -2.8758     0.3224   -8.92 6.11e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.206 on 30 degrees of freedom
Multiple R-squared:  0.7262,    Adjusted R-squared:  0.7171 
F-statistic: 79.56 on 1 and 30 DF,  p-value: 6.113e-10

And

 library(broom)
 tidy(model)

yields the following data frame:产生以下数据框:

         term estimate std.error statistic      p.value
1 (Intercept) 37.88458 2.0738436 18.267808 8.369155e-18
2         cyl -2.87579 0.3224089 -8.919699 6.112687e-10

Look at ?tidy.lm to see more options, for instance for confidence intervals, etc.查看?tidy.lm以查看更多选项,例如置信区间等。

To combine the output of your ten models into one dataframe, you could use要将十个模型的输出组合成一个数据帧,您可以使用

library(dplyr)
bind_rows(one, two, three, ... , .id="models")

Or, if your different models come from regressions using the same dataframe, you can combine it with dplyr :或者,如果您的不同模型来自使用相同数据dplyr回归,您可以将其与dplyr结合使用:

models <- mtcars %>% group_by(gear) %>% do(data.frame(tidy(lm(mpg~cyl, data=.), conf.int=T)))

Source: local data frame [6 x 8]
Groups: gear

  gear        term  estimate std.error statistic      p.value  conf.low  conf.high
1    3 (Intercept) 29.783784 4.5468925  6.550360 1.852532e-05 19.960820 39.6067478
2    3         cyl -1.831757 0.6018987 -3.043297 9.420695e-03 -3.132080 -0.5314336
3    4 (Intercept) 41.275000 5.9927925  6.887440 4.259099e-05 27.922226 54.6277739
4    4         cyl -3.587500 1.2587382 -2.850076 1.724783e-02 -6.392144 -0.7828565
5    5 (Intercept) 40.580000 3.3238331 12.208796 1.183209e-03 30.002080 51.1579205
6    5         cyl -3.200000 0.5308798 -6.027730 9.153118e-03 -4.889496 -1.5105036

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将来自多个样本的回归摘要输出组合到 R 中的单个数据帧中 - Combining regression summary outputs from multiple samples into a single dataframe in R 编写一个输出几个回归结果的函数 - Writing a function that outputs several regression results R:创建多个回归输出的可发布表 - R: create publishable tables of several regression outputs R从具有多列信息的数据框中计算汇总数据框 - R calculate summary dataframe from dataframe with multiple columns of information 几个或多个时间序列绘制来自单个数据帧的输出 - Several or multiple timeseries plot outputs from a single data frame 如何解释具有多个输出的 keras 模型的摘要? - How to interpret the summary of a keras model with multiple outputs? 多次拟合回归并收集汇总统计信息 - Fitting regression multiple times and gather summary statistics 在一个数据帧中多次进行线性回归计算 - Linear Regression calculation several times in one dataframe 有没有一种很好的方法可以将多个集群输出的结果以数据帧的形式转换为一个? 有什么建议么? - Is there a great way to grab the results from several cluster outputs in to one in the form of a dataframe? Any Suggestions? R中回归循环的摘要状态 - Summary stat from regression loop in r
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM