简体   繁体   English

获取由lm()返回的“ mlm”对象的回归系数的置信区间

[英]Get confidence intervals for regression coefficients of “mlm” object returned by `lm()`

I'm running a multivariate regression with 2 outcome variables and 5 predictors. 我正在使用2个结果变量和5个预测变量进行多元回归。 I would like to obtain the confidence intervals for all regression coefficients. 我想获得所有回归系数的置信区间。 Usually I use the function lm but it doesn't seem to work for a multivariate regression model (object mlm ). 通常我使用函数lm但它似乎不适用于多元回归模型(对象mlm )。

Here's a reproducible example. 这是一个可复制的示例。

library(car)
mod <- lm(cbind(income, prestige) ~ education + women, data=Prestige)
confint(mod) # doesn't return anything.

Any alternative way to do it? 还有其他替代方法吗? (I could just use the value of the standard error and multiply by the right critical t value, but I was wondering if there was an easier way to do it). (我可以只使用标准误差的值乘以正确的临界t值,但我想知道是否有更简单的方法来做到这一点)。

This comes from the predict.lm example. 这来自predict.lm示例。 You want the interval = 'confidence' option. 您需要interval = 'confidence'选项。

x <- rnorm(15)
y <- x + rnorm(15)
predict(lm(y ~ x))
new <- data.frame(x = seq(-3, 3, 0.5))
predict(lm(y ~ x), new, se.fit = TRUE)
pred.w.clim <- predict(lm(y ~ x), new, interval = "confidence")
matplot(new$x, pred.w.clim,
        lty = c(1,2,2,3,3), type = "l", ylab = "predicted y")

confint won't return you anything, because there is no "mlm" method supported: confint不会返回任何信息,因为不支持“ mlm”方法:

methods(confint)
#[1] confint.default confint.glm*    confint.lm      confint.nls*  

As you said, we can just plus / minus some multiple of standard error to get upper / lower bound of confidence interval. 如您所说,我们可以正负一些标准误差,以得到置信区间的上限/下限。 You were probably going to do this via coef(summary(mod)) , then use some *apply method to extract standard errors. 您可能打算通过coef(summary(mod)) ,然后使用一些*apply方法提取标准错误。 But my answer to Obtain standard errors of regression coefficients for an “mlm” object returned by lm() gives you a supper efficient way to get standard errors without going through summary . 但是, 获得lm()返回的“ mlm”对象的回归系数的标准误差的标准 答案 ,为您提供了一种无需查看summary即可获得标准误差的高效方法。 Applying std_mlm to your example model gives: std_mlm应用于示例模型可以得到:

se <- std_mlm(mod)
#                 income   prestige
#(Intercept) 1162.299027 3.54212524
#education    103.731410 0.31612316
#women          8.921229 0.02718759

Now, we define another small function to compute lower and upper bound: 现在,我们定义另一个小函数来计算上下限:

## add "mlm" method to generic function "confint"
confint.mlm <- function (model, level = 0.95) {
  beta <- coef(model)
  se <- std_mlm (model)
  alpha <- qt((1 - level) / 2, df = model$df.residual)
  list(lower = beta + alpha * se, upper = beta - alpha * se)
  }

## call "confint"
confint(mod)

#$lower
#                 income    prestige
#(Intercept) -3798.25140 -15.7825086
#education     739.05564   4.8005390
#women         -81.75738  -0.1469923
#
#$upper
#                income    prestige
#(Intercept)  814.25546 -1.72581876
#education   1150.70689  6.05505285
#women        -46.35407 -0.03910015

It is easy to interpret this. 这很容易解释。 For example, for response income , the 95%-confidence interval for all variables are 例如,对于响应income ,所有变量的95%置信区间为

#(intercept)    (-3798.25140, 814.25546)
#  education    (739.05564, 1150.70689)
#      women    (-81.75738, -46.35407)

This seems to have been discussed recently (July 2018) on the R-devel list , so hopefully by the next version of R it will be fixed. 似乎最近(2018年7月)在R-devel列表上对此进行了讨论,因此希望在R的下一版本中将其修复。 A workaround proposed on that list is to use: 该列表上建议的解决方法是使用:

confint.mlm <- function (object, level = 0.95, ...) {
  cf <- coef(object)
  ncfs <- as.numeric(cf)
  a <- (1 - level)/2
  a <- c(a, 1 - a)
  fac <- qt(a, object$df.residual)
  pct <- stats:::format.perc(a, 3)
  ses <- sqrt(diag(vcov(object)))
  ci <- ncfs + ses %o% fac
  setNames(data.frame(ci),pct)
}

Test: 测试:

fit_mlm <- lm(cbind(mpg, disp) ~ wt, mtcars)
confint(fit_mlm)

Gives: 得到:

                       2.5 %     97.5 %
mpg:(Intercept)    33.450500  41.119753
mpg:wt             -6.486308  -4.202635
disp:(Intercept) -204.091436 -58.205395
disp:wt            90.757897 134.198380

Personnally, I like it in a clean tibble way (using broom::tidy would be even better, but has an issue currently) 就个人而言,我喜欢用干净的方式(使用broom::tidy会更好,但当前存在问题)

library(tidyverse)
confint(fit_mlm) %>% 
  rownames_to_column() %>% 
  separate(rowname, c("response", "term"), sep=":")

Gives: 得到:

  response        term       2.5 %     97.5 %
1      mpg (Intercept)   33.450500  41.119753
2      mpg          wt   -6.486308  -4.202635
3     disp (Intercept) -204.091436 -58.205395
4     disp          wt   90.757897 134.198380

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM