简体   繁体   English

用置信区间绘制回归系数

[英]Plot regression coefficient with confidence intervals

Suppose I have 2 data frames, one for 2015 and one for 2016. I want to run a regression for each data frame and plot one of the coefficient for each regression with their respective confidence interval. 假设我有2个数据框,一个用于2015年,一个用于2016年。我想为每个数据框运行回归,并为每个回归绘制系数之一及其各自的置信区间。 For example: 例如:

set.seed(1020022316)
library(dplyr)
library(stargazer)

df16 <- data.frame(
  x1 = rnorm(1000, 0, 2),
  t = sample(c(0, 1), 1000, T),
  e = rnorm(1000, 0, 10)
) %>% mutate(y = 0.5 * x1 + 2 * t + e) %>%
  select(-e)

df15 <- data.frame(
  x1 = rnorm(1000, 0, 2),
  t = sample(c(0, 1), 1000, T),
  e = rnorm(1000, 0, 10)
) %>% mutate(y = 0.75 * x1 + 2.5 * t + e) %>%
  select(-e)

lm16 <- lm(y ~ x1 + t, data = df16)

lm15 <- lm(y ~ x1 + t, data = df15)

stargazer(lm15, lm16, type="text", style = "aer", ci = TRUE, ci.level = 0.95)

I want to plot t=1.558, x=2015 , and t=2.797, x=2016 with their respective .95 CI. 我想用各自的.95 CI绘制t=1.558, x=2015t=2.797, x=2016 What is the best way of doing this? 最好的方法是什么?

I could do it 'by hand', but I hope there is a better way. 我可以“手工”完成,但是我希望有更好的方法。

library(ggplot2)
df.plot <-
  data.frame(
    y = c(lm15$coefficients[['t']], lm16$coefficients[['t']]),
    x = c(2015, 2016),
    lb = c(
      confint(lm15, 't', level = 0.95)[1],
      confint(lm16, 't', level = 0.95)[1]
    ),
    ub = c(
      confint(lm15, 't', level = 0.95)[2],
      confint(lm16, 't', level = 0.95)[2]
    )
  )
df.plot %>% ggplot(aes(x, y)) + geom_point() +
  geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1) + 
  geom_hline(aes(yintercept=0), linetype="dashed")

在此处输入图片说明


Best: The figure quality (looks nice), code elegance, easy to expand (more than 2 regressions) 最佳:图形质量(看起来不错),代码优美,易于扩展(超过2个回归)

This is a bit too long for a comment, so I post it as a partial answer. 对于评论来说,这太长了,因此我将其发布为部分答案。

It is unclear from your post if your main problem is to get the data into the right shape, or if it is the plotting itself. 从您的帖子中还不清楚您的主要问题是使数据变为正确的形状,还是绘图本身。 But just to follow up on one of the comments, let me show you how to do run several models using dplyr and broom that makes plotting easy. 但是,为了跟进其中一项评论,让我向您展示如何使用dplyrbroom运行几个模型,使绘制变得容易。 Consider the mtcars -dataset: 考虑mtcars -dataset:

 library(dplyr)
 library(broom)
 models <- mtcars %>% group_by(cyl) %>% 
           do(data.frame(tidy(lm(mpg ~ disp, data = .),conf.int=T )))

 head(models) # I have abbreviated the following output a bit

    cyl        term estimate std.error statistic   p.value conf.low conf.high
  (dbl)       (chr)    (dbl)     (dbl)     (dbl)     (dbl)    (dbl)     (dbl)
     4 (Intercept)  40.8720    3.5896     11.39 0.0000012   32.752  48.99221
     4        disp  -0.1351    0.0332     -4.07 0.0027828   -0.210  -0.06010
     6 (Intercept)  19.0820    2.9140      6.55 0.0012440   11.591  26.57264
     6        disp   0.0036    0.0156      0.23 0.8259297   -0.036   0.04360

You see that this gives you all coefficients and confidence intervals in one nice dataframe, which makes plotting with ggplot easier. 您会看到,这在一个不错的数据帧中为您提供了所有系数和置信区间,这使使用ggplot进行绘制变得更加容易。 For instance, if your datasets have identical content, you could add a year identifier to them (eg df1$year <- 2000; df2$year <- 2001 etc), and bind them together afterwards (eg using bind_rows , of you can use bind_rows 's .id option). 例如,如果您的数据集具有相同的内容,则可以向它们添加年份标识符(例如df1$year <- 2000; df2$year <- 2001 bind_rows等),然后将它们绑定在一起(例如,使用bind_rows ,可以使用bind_rows.id选项)。 Then you can use the year identifer instead of cyl in the above example. 然后,在上面的示例中,您可以使用年份标识符而不是cyl

The plotting then is simple. 这样绘制就很简单。 To use the mtcars data again, let's plot the coefficients for disp only (though you could also use faceting , grouping , etc): 要再次使用mtcars数据,让我们仅绘制disp的系数(尽管您也可以使用facetinggrouping等):

 ggplot(filter(models, term=="disp"), aes(x=cyl, y=estimate)) + 
          geom_point() + geom_errorbar(aes(ymin=conf.low, ymax=conf.high))

To use your data: 要使用您的数据:

 df <- bind_rows(df16, df15, .id = "years")

 models <- df %>% group_by(years) %>% 
           do(data.frame(tidy(lm(y ~ x1+t, data = .),conf.int=T ))) %>%
           filter(term == "t") %>% 
           ggplot(aes(x=years, y=estimate)) + geom_point() + 
           geom_errorbar(aes(ymin=conf.low, ymax=conf.high)) 

Note that you can easily add more and more models just by binding more and more data to the main dataframe. 请注意,仅通过将越来越多的数据绑定到主数据框就可以轻松地添加越来越多的模型。 You can also easily use faceting , grouping or position- dodge ing to adjust the look of the corresponding plot if you want to plot more than one coefficient. 如果要绘制多个系数,则还可以轻松使用facetinggrouping或位置dodge来调整相应图的外观。

This is the solution I have right now: 这是我现在拥有的解决方案:

gen_df_plot <- function(reg, coef_name){
  df <- data.frame(y = reg$coefficients[[coef_name]],
                   lb = confint(reg, coef_name, level = 0.95)[1],
                   ub = confint(reg, coef_name, level = 0.95)[2])
  return(df)
}

df.plot <- lapply(list(lm15,lm16), gen_df_plot, coef_name = 't')

df.plot <- data.table::rbindlist(df.plot)

df.plot$x <- as.factor(c(2015, 2016))

df.plot %>% ggplot(aes(x, y)) + geom_point(size=4) +
  geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1, linetype="dotted") + 
  geom_hline(aes(yintercept=0), linetype="dashed") + theme_bw()

在此处输入图片说明

I don't love it, but it works. 我不喜欢它,但是它有效。

Here is what might be generalized code. 这是通用代码。 I have made a change to how "x" is defined so that you don't have to worry about alphabetic reordering of the factor. 我对“ x”的定义方式进行了更改,因此您不必担心该因素的字母重新排序。

#
# Paul Gronke and Paul Manson
# Early Voting Information Center at Reed College
#
# August 27, 2019
#
#
# Code to plot a single coefficient from multiple models, provided
# as an easier alternative to "coefplot" and "dotwhisker". Some users
# may find those packages more capable
#
# Code adapted from https://stackoverflow.com/questions/35582052/plot-regression-coefficient-with-confidence-intervals


# gen_df_plot function will create a tidy data frame for your plot
#   Currently set up to display 95% confidence intervals

gen_df_plot <- function(reg, coef_name){
  df <- data.frame(y = reg$coefficients[[coef_name]],
                   lb = confint(reg, coef_name, level = 0.95)[1],
                   ub = confint(reg, coef_name, level = 0.95)[2])
  return(df)
}

# Populate the data frame with a list of your model results.

df.plot <- lapply(list(model1,      # List your models here
                       model2), 
                  gen_df_plot, 
                  coef_name = 'x1') # Coefficient name

  # Convert the list to a tidy data frame

df.plot <- data.table::rbindlist(df.plot)

# Provide the coefficient or regression labels below, in the
# order that you want them to appear. The "levels=unique(.)" parameter
# overrides R's desire to order the factor alphabetically

df.plot$x <- c("Group 1", 
               "Group 2") %>%
  factor(., levels = unique(.),
         ordered = TRUE)

# Create your plot

df.plot %>% ggplot(aes(x, y)) + 
  geom_point(size=4) +
  geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1, linetype="dotted") + 
  geom_hline(aes(yintercept=0), linetype="dashed") + 
  theme_bw() +
  ggtitle("Comparing Coefficients") +
  ylab("Coefficient Value")```

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM