用置信區間繪制回歸系數

Question

假設我有2個數據框，一個用於2015年，一個用於2016年。我想為每個數據框運行回歸，並為每個回歸繪制系數之一及其各自的置信區間。 例如：

set.seed(1020022316)
library(dplyr)
library(stargazer)

df16 <- data.frame(
  x1 = rnorm(1000, 0, 2),
  t = sample(c(0, 1), 1000, T),
  e = rnorm(1000, 0, 10)
) %>% mutate(y = 0.5 * x1 + 2 * t + e) %>%
  select(-e)

df15 <- data.frame(
  x1 = rnorm(1000, 0, 2),
  t = sample(c(0, 1), 1000, T),
  e = rnorm(1000, 0, 10)
) %>% mutate(y = 0.75 * x1 + 2.5 * t + e) %>%
  select(-e)

lm16 <- lm(y ~ x1 + t, data = df16)

lm15 <- lm(y ~ x1 + t, data = df15)

stargazer(lm15, lm16, type="text", style = "aer", ci = TRUE, ci.level = 0.95)

我想用各自的.95 CI繪制t=1.558, x=2015和t=2.797, x=2016 。 最好的方法是什么？

我可以“手工”完成，但是我希望有更好的方法。

library(ggplot2)
df.plot <-
  data.frame(
    y = c(lm15$coefficients[['t']], lm16$coefficients[['t']]),
    x = c(2015, 2016),
    lb = c(
      confint(lm15, 't', level = 0.95)[1],
      confint(lm16, 't', level = 0.95)[1]
    ),
    ub = c(
      confint(lm15, 't', level = 0.95)[2],
      confint(lm16, 't', level = 0.95)[2]
    )
  )
df.plot %>% ggplot(aes(x, y)) + geom_point() +
  geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1) + 
  geom_hline(aes(yintercept=0), linetype="dashed")

最佳：圖形質量（看起來不錯），代碼優美，易於擴展（超過2個回歸）

Answer 1

對於評論來說，這太長了，因此我將其發布為部分答案。

從您的帖子中還不清楚您的主要問題是使數據變為正確的形狀，還是繪圖本身。 但是，為了跟進其中一項評論，讓我向您展示如何使用dplyr和broom運行幾個模型，使繪制變得容易。 考慮mtcars -dataset：

 library(dplyr)
 library(broom)
 models <- mtcars %>% group_by(cyl) %>% 
           do(data.frame(tidy(lm(mpg ~ disp, data = .),conf.int=T )))

 head(models) # I have abbreviated the following output a bit

    cyl        term estimate std.error statistic   p.value conf.low conf.high
  (dbl)       (chr)    (dbl)     (dbl)     (dbl)     (dbl)    (dbl)     (dbl)
     4 (Intercept)  40.8720    3.5896     11.39 0.0000012   32.752  48.99221
     4        disp  -0.1351    0.0332     -4.07 0.0027828   -0.210  -0.06010
     6 (Intercept)  19.0820    2.9140      6.55 0.0012440   11.591  26.57264
     6        disp   0.0036    0.0156      0.23 0.8259297   -0.036   0.04360

您會看到，這在一個不錯的數據幀中為您提供了所有系數和置信區間，這使使用ggplot進行繪制變得更加容易。 例如，如果您的數據集具有相同的內容，則可以向它們添加年份標識符（例如df1$year <- 2000; df2$year <- 2001 bind_rows等），然后將它們綁定在一起（例如，使用bind_rows ，可以使用bind_rows的.id選項）。 然后，在上面的示例中，您可以使用年份標識符而不是cyl 。

這樣繪制就很簡單。 要再次使用mtcars數據，讓我們僅繪制disp的系數（盡管您也可以使用faceting ， grouping等）：

 ggplot(filter(models, term=="disp"), aes(x=cyl, y=estimate)) + 
          geom_point() + geom_errorbar(aes(ymin=conf.low, ymax=conf.high))

要使用您的數據：

 df <- bind_rows(df16, df15, .id = "years")

 models <- df %>% group_by(years) %>% 
           do(data.frame(tidy(lm(y ~ x1+t, data = .),conf.int=T ))) %>%
           filter(term == "t") %>% 
           ggplot(aes(x=years, y=estimate)) + geom_point() + 
           geom_errorbar(aes(ymin=conf.low, ymax=conf.high))

請注意，僅通過將越來越多的數據綁定到主數據框就可以輕松地添加越來越多的模型。 如果要繪制多個系數，則還可以輕松使用faceting ， grouping或位置dodge來調整相應圖的外觀。

Answer 2

這是我現在擁有的解決方案：

gen_df_plot <- function(reg, coef_name){
  df <- data.frame(y = reg$coefficients[[coef_name]],
                   lb = confint(reg, coef_name, level = 0.95)[1],
                   ub = confint(reg, coef_name, level = 0.95)[2])
  return(df)
}

df.plot <- lapply(list(lm15,lm16), gen_df_plot, coef_name = 't')

df.plot <- data.table::rbindlist(df.plot)

df.plot$x <- as.factor(c(2015, 2016))

df.plot %>% ggplot(aes(x, y)) + geom_point(size=4) +
  geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1, linetype="dotted") + 
  geom_hline(aes(yintercept=0), linetype="dashed") + theme_bw()

我不喜歡它，但是它有效。

Answer 3

這是通用代碼。 我對“ x”的定義方式進行了更改，因此您不必擔心該因素的字母重新排序。

#
# Paul Gronke and Paul Manson
# Early Voting Information Center at Reed College
#
# August 27, 2019
#
#
# Code to plot a single coefficient from multiple models, provided
# as an easier alternative to "coefplot" and "dotwhisker". Some users
# may find those packages more capable
#
# Code adapted from https://stackoverflow.com/questions/35582052/plot-regression-coefficient-with-confidence-intervals


# gen_df_plot function will create a tidy data frame for your plot
#   Currently set up to display 95% confidence intervals

gen_df_plot <- function(reg, coef_name){
  df <- data.frame(y = reg$coefficients[[coef_name]],
                   lb = confint(reg, coef_name, level = 0.95)[1],
                   ub = confint(reg, coef_name, level = 0.95)[2])
  return(df)
}

# Populate the data frame with a list of your model results.

df.plot <- lapply(list(model1,      # List your models here
                       model2), 
                  gen_df_plot, 
                  coef_name = 'x1') # Coefficient name

  # Convert the list to a tidy data frame

df.plot <- data.table::rbindlist(df.plot)

# Provide the coefficient or regression labels below, in the
# order that you want them to appear. The "levels=unique(.)" parameter
# overrides R's desire to order the factor alphabetically

df.plot$x <- c("Group 1", 
               "Group 2") %>%
  factor(., levels = unique(.),
         ordered = TRUE)

# Create your plot

df.plot %>% ggplot(aes(x, y)) + 
  geom_point(size=4) +
  geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1, linetype="dotted") + 
  geom_hline(aes(yintercept=0), linetype="dashed") + 
  theme_bw() +
  ggtitle("Comparing Coefficients") +
  ylab("Coefficient Value")```

用置信區間繪制回歸系數

問題描述

3 個解決方案

解決方案1
5 已采納 2016-02-23 17:03:42

解決方案2
0 2016-02-23 16:49:24

解決方案3
0 2019-08-27 19:07:56

用置信區間繪制回歸系數

問題描述

3 個解決方案

解決方案1 5 已采納 2016-02-23 17:03:42

解決方案2 0 2016-02-23 16:49:24

解決方案3 0 2019-08-27 19:07:56

解決方案1
5 已采納 2016-02-23 17:03:42

解決方案2
0 2016-02-23 16:49:24

解決方案3
0 2019-08-27 19:07:56