简体   繁体   English

在R中汇总回归输出

[英]aggregating regression outputs in R

I am performing multiple pooled cross-sectional regressions with a loop function and stored the regression outputs in a list (regression). 我正在使用循环函数执行多个池横截面回归,并将回归输出存储在列表中(回归)。 What i would like to do now is to efficiently obtain the average coefficients, the average t-stats as well as the average adj.r squared. 我现在想做的是有效地获得平均系数,平均t统计量以及平均调整平方。

I put up the following code already: 我已经放置了以下代码:

library(plm)
data("Grunfeld", package="plm")

# create list with regression outputs
regression <- list()

# Regression on past six-year subsets of Grunfeld in every year from 1940 to 1950
for(t in 1940:1950){

  regression[[as.character(t)]] <- lm(inv ~ value + capital, 
                              subset(Grunfeld, year<=t & year>=t-5))
}

This way I obtain the desired regression output stored in a list (regression). 这样,我可以获得存储在列表中的所需回归输出(回归)。 What i would like to do now is to efficiently obtain the average coefficients, the average t-stats as well as the average adj.r squared. 我现在想做的是有效地获得平均系数,平均t统计量以及平均调整平方。

I already tried to calculated the mean of all the adj. 我已经尝试计算所有调整的平均值。 r squared with the following: r与以下项平方:

mean(lapply(regression, function(x) summary(x)$adj.r.squared))

However it seems that I am using the mean function wrong as i get the following error. 但是似乎我使用均值函数错误,因为我得到以下错误。

Warning message:
In mean.default(lapply(regression, function(x) summary(x)$adj.r.squared)) :
  argument is not numeric or logical: returning NA

Further i came up with the following to "extract" the coefficients. 此外,我想出了以下“提取”系数的方法。

lapply(regression, function(x) summary(x)$coefficients)

How can I quickly obtain the average individual coefficients from this lapply output? 如何快速从该不正确的输出中获取平均个体系数? (ie extracting each row individually and calculate the respective mean over the years.) (即,分别提取每一行并计算多年来的平均值。)

$`1940`
               Estimate   Std. Error    t value     Pr(>|t|)
(Intercept) -3.65239712 14.647050149 -0.2493606 8.039783e-01
value        0.08283141  0.006873563 12.0507230 2.615793e-17
capital      0.11033307  0.091543522  1.2052526 2.330857e-01

$`1941`
                Estimate   Std. Error   t value     Pr(>|t|)
(Intercept) -13.77258038 16.677399231 -0.825823 4.123477e-01
value         0.08614094  0.007258571 11.867480 4.904857e-17
capital       0.18680229  0.094849038  1.969470 5.376624e-02

....

You almost had it right! 您几乎是对的! Try this: 尝试这个:

> sapply(regression, function(x) mean(summary(x)$adj.r.squared))
     1940      1941      1942      1943      1944      1945      1946      1947      1948      1949 
0.7230061 0.7293396 0.7399216 0.7770505 0.7998859 0.8413422 0.8571037 0.8561229 0.8348950 0.8357761 
     1950 
0.8324654 

You may as well use lapply() in the above. 您也可以在上面使用lapply() The same can be done for any coefficient, t-test, once you identify how to extract them from the coefficient table. 一旦确定了如何从系数表中提取系数,就可以对任何系数t检验执行相同的操作。

To extract coefficients for value you can proceed as follows: 要提取value系数,可以按以下步骤进行:

lapply(regression, function(x) summary(x)$coefficients[ rownames(summary(x)$coefficients)=="value", ])

A more compact version would be: 一个更紧凑的版本是:

sapply(regression, function(x) summary(x)$coefficients[ rownames(summary(x)$coefficients)=="value", ])

From the above you can obtain means as follows: 从上面您可以获得以下方法:

> (x <- t(sapply(regression, function(x) summary(x)$coefficients[ rownames(summary(x)$coefficients)=="value", ])))
       Estimate  Std. Error  t value     Pr(>|t|)
1940 0.08283141 0.006873563 12.05072 2.615793e-17
1941 0.08614094 0.007258571 11.86748 4.904857e-17
1942 0.09018745 0.007711639 11.69498 8.898811e-17
1943 0.09945565 0.007751087 12.83119 1.886416e-18
1944 0.10568804 0.007376523 14.32762 1.516617e-20
1945 0.11358598 0.006722166 16.89723 7.314450e-24
1946 0.12227763 0.006781509 18.03104 3.203995e-25
1947 0.12599497 0.007199027 17.50167 1.356383e-24
1948 0.12605599 0.008005481 15.74621 2.030259e-22
1949 0.12951740 0.008452725 15.32256 7.175275e-22
1950 0.13647072 0.009530406 14.31951 1.555615e-20
> colMeans(x)
    Estimate   Std. Error      t value     Pr(>|t|) 
1.107460e-01 7.605700e-03 1.459911e+01 1.510115e-17 

This said, this looks suspiciously like the Fama-MacBeth estimates: Fama MacBeth standard errors in R . 也就是说,这看起来像Fama-MacBeth的估计值: R中的Fama MacBeth标准错误 These can be easily obtained using pmg() in plm . 这些可以使用plm pmg()轻松获得。

You could try: 您可以尝试:

 library(reshape2)
  dcast(melt(lapply(regression, 
       function(x) summary(x)$coefficients)), Var1~Var2, value.var="value", mean)
 #         Var1    Estimate Std. Error   t value     Pr(>|t|)
 #1 (Intercept) -16.7072859 16.0876958 -1.029145 3.320868e-01
 #2       value   0.1107460  0.0076057 14.599109 1.510115e-17
 #3     capital   0.1279743  0.0685896  1.833861 9.389504e-02

Or 要么

 Reduce(`+`,lapply(regression, function(x) summary(x)$coefficients))/length(regression)
 #                    Estimate Std. Error   t value     Pr(>|t|)
 #(Intercept) -16.7072859 16.0876958 -1.029145 3.320868e-01
 #value         0.1107460  0.0076057 14.599109 1.510115e-17
 #capital       0.1279743  0.0685896  1.833861 9.389504e-02

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM