[英]summary dataframe from several multiple regression outputs
I am doing multiple OLS regressions.我正在做多个 OLS 回归。 I have used the following lm function:我使用了以下 lm 函数:
GroupNetReturnsStockPickers <- read.csv("GroupNetReturnsStockPickers.csv", header=TRUE, sep=",", dec=".")
ModelGroupNetReturnsStockPickers <- lm(StockPickersNet ~ Mkt.RF+SMB+HML+WML, data=GroupNetReturnsStockPickers)
names(GroupNetReturnsStockPickers)
summary(ModelGroupNetReturnsStockPickers)
Which gives me the summary output of:这给了我以下的摘要输出:
Call:
lm(formula = StockPickersNet ~ Mkt.RF + SMB + HML + WML, data = GroupNetReturnsStockPickers)
Residuals:
Min 1Q Median 3Q Max
-0.029698 -0.005069 -0.000328 0.004546 0.041948
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.655e-05 5.981e-04 0.078 0.938
Mkt.RF -1.713e-03 1.202e-02 -0.142 0.887
SMB 3.006e-02 2.545e-02 1.181 0.239
HML 1.970e-02 2.350e-02 0.838 0.403
WML 1.107e-02 1.444e-02 0.766 0.444
Residual standard error: 0.009029 on 251 degrees of freedom
Multiple R-squared: 0.01033, Adjusted R-squared: -0.005445
F-statistic: 0.6548 on 4 and 251 DF, p-value: 0.624
This is perfect.太棒了。 However, I am doing a total of 10 multiple OLS regressions, and I wish to create my own summary output, in a data frame, where I extract the Intercept Estimate, the tvalue estimate, and the p-value, for all 10 analyzes individually.但是,我总共进行了 10 次多重 OLS 回归,我希望在数据框中创建自己的汇总输出,在其中分别提取所有 10 次分析的截距估计值、t 值估计值和 p 值. Hence it would be a 10x3, where the columns names would be Model1, Model2,..,Model10, and row names: Value, t-value and p-Value.因此它将是一个 10x3,其中列名称将是 Model1、Model2、..、Model10,行名称:Value、t-value 和 p-Value。
I appreciate any help.我很感激任何帮助。
There's a few packages that do this (stargazer and texreg) as well as this code for outreg .有几个包可以做到这一点(stargazer 和 texreg)以及outreg 的代码。
In any case, if you are only interested in the intercept here is one approach:无论如何,如果您只对拦截感兴趣,这里是一种方法:
# Estimate a bunch of different models, stored in a list
fits <- list() # Create empty list to store models
fits$model1 <- lm(Ozone ~ Solar.R, data = airquality)
fits$model2 <- lm(Ozone ~ Solar.R + Wind, data = airquality)
fits$model3 <- lm(Ozone ~ Solar.R + Wind + Temp, data = airquality)
# Combine the results for the intercept
do.call(cbind, lapply(fits, function(z) summary(z)$coefficients["(Intercept)", ]))
# RESULT:
# model1 model2 model3
# Estimate 18.598727772 7.724604e+01 -64.342078929
# Std. Error 6.747904163 9.067507e+00 23.054724347
# t value 2.756222869 8.518995e+00 -2.790841389
# Pr(>|t|) 0.006856021 1.052118e-13 0.006226638
Look at the broom
package, which was created to do exactly what you are asking for.看看broom
包,它是为完全满足您的要求而创建的。 The only difference is that it puts the models into rows and the different statistics into columns, and I understand that you would prefer the opposite, but you can work around that afterwards if it is really necessary.唯一的区别是它将模型放入行中,将不同的统计数据放入列中,我知道您更喜欢相反的情况,但如果确实有必要,您可以在之后解决这个问题。
To give you an example, the function tidy()
converts a model output into a dataframe.举个例子,函数tidy()
将模型输出转换为数据帧。
model <- lm(mpg ~ cyl, data=mtcars)
summary(model)
Call:
lm(formula = mpg ~ cyl, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.9814 -2.1185 0.2217 1.0717 7.5186
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.8846 2.0738 18.27 < 2e-16 ***
cyl -2.8758 0.3224 -8.92 6.11e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.206 on 30 degrees of freedom
Multiple R-squared: 0.7262, Adjusted R-squared: 0.7171
F-statistic: 79.56 on 1 and 30 DF, p-value: 6.113e-10
And和
library(broom)
tidy(model)
yields the following data frame:产生以下数据框:
term estimate std.error statistic p.value
1 (Intercept) 37.88458 2.0738436 18.267808 8.369155e-18
2 cyl -2.87579 0.3224089 -8.919699 6.112687e-10
Look at ?tidy.lm
to see more options, for instance for confidence intervals, etc.查看?tidy.lm
以查看更多选项,例如置信区间等。
To combine the output of your ten models into one dataframe, you could use要将十个模型的输出组合成一个数据帧,您可以使用
library(dplyr)
bind_rows(one, two, three, ... , .id="models")
Or, if your different models come from regressions using the same dataframe, you can combine it with dplyr
:或者,如果您的不同模型来自使用相同数据dplyr
回归,您可以将其与dplyr
结合使用:
models <- mtcars %>% group_by(gear) %>% do(data.frame(tidy(lm(mpg~cyl, data=.), conf.int=T)))
Source: local data frame [6 x 8]
Groups: gear
gear term estimate std.error statistic p.value conf.low conf.high
1 3 (Intercept) 29.783784 4.5468925 6.550360 1.852532e-05 19.960820 39.6067478
2 3 cyl -1.831757 0.6018987 -3.043297 9.420695e-03 -3.132080 -0.5314336
3 4 (Intercept) 41.275000 5.9927925 6.887440 4.259099e-05 27.922226 54.6277739
4 4 cyl -3.587500 1.2587382 -2.850076 1.724783e-02 -6.392144 -0.7828565
5 5 (Intercept) 40.580000 3.3238331 12.208796 1.183209e-03 30.002080 51.1579205
6 5 cyl -3.200000 0.5308798 -6.027730 9.153118e-03 -4.889496 -1.5105036
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.