[英]Plot regression coefficient with confidence intervals
Suppose I have 2 data frames, one for 2015 and one for 2016. I want to run a regression for each data frame and plot one of the coefficient for each regression with their respective confidence interval. 假设我有2个数据框,一个用于2015年,一个用于2016年。我想为每个数据框运行回归,并为每个回归绘制系数之一及其各自的置信区间。 For example:
例如:
set.seed(1020022316)
library(dplyr)
library(stargazer)
df16 <- data.frame(
x1 = rnorm(1000, 0, 2),
t = sample(c(0, 1), 1000, T),
e = rnorm(1000, 0, 10)
) %>% mutate(y = 0.5 * x1 + 2 * t + e) %>%
select(-e)
df15 <- data.frame(
x1 = rnorm(1000, 0, 2),
t = sample(c(0, 1), 1000, T),
e = rnorm(1000, 0, 10)
) %>% mutate(y = 0.75 * x1 + 2.5 * t + e) %>%
select(-e)
lm16 <- lm(y ~ x1 + t, data = df16)
lm15 <- lm(y ~ x1 + t, data = df15)
stargazer(lm15, lm16, type="text", style = "aer", ci = TRUE, ci.level = 0.95)
I want to plot t=1.558, x=2015
, and t=2.797, x=2016
with their respective .95 CI. 我想用各自的.95 CI绘制
t=1.558, x=2015
和t=2.797, x=2016
。 What is the best way of doing this? 最好的方法是什么?
I could do it 'by hand', but I hope there is a better way. 我可以“手工”完成,但是我希望有更好的方法。
library(ggplot2)
df.plot <-
data.frame(
y = c(lm15$coefficients[['t']], lm16$coefficients[['t']]),
x = c(2015, 2016),
lb = c(
confint(lm15, 't', level = 0.95)[1],
confint(lm16, 't', level = 0.95)[1]
),
ub = c(
confint(lm15, 't', level = 0.95)[2],
confint(lm16, 't', level = 0.95)[2]
)
)
df.plot %>% ggplot(aes(x, y)) + geom_point() +
geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1) +
geom_hline(aes(yintercept=0), linetype="dashed")
Best: The figure quality (looks nice), code elegance, easy to expand (more than 2 regressions) 最佳:图形质量(看起来不错),代码优美,易于扩展(超过2个回归)
This is a bit too long for a comment, so I post it as a partial answer. 对于评论来说,这太长了,因此我将其发布为部分答案。
It is unclear from your post if your main problem is to get the data into the right shape, or if it is the plotting itself. 从您的帖子中还不清楚您的主要问题是使数据变为正确的形状,还是绘图本身。 But just to follow up on one of the comments, let me show you how to do run several models using
dplyr
and broom
that makes plotting easy. 但是,为了跟进其中一项评论,让我向您展示如何使用
dplyr
和broom
运行几个模型,使绘制变得容易。 Consider the mtcars
-dataset: 考虑
mtcars
-dataset:
library(dplyr)
library(broom)
models <- mtcars %>% group_by(cyl) %>%
do(data.frame(tidy(lm(mpg ~ disp, data = .),conf.int=T )))
head(models) # I have abbreviated the following output a bit
cyl term estimate std.error statistic p.value conf.low conf.high
(dbl) (chr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
4 (Intercept) 40.8720 3.5896 11.39 0.0000012 32.752 48.99221
4 disp -0.1351 0.0332 -4.07 0.0027828 -0.210 -0.06010
6 (Intercept) 19.0820 2.9140 6.55 0.0012440 11.591 26.57264
6 disp 0.0036 0.0156 0.23 0.8259297 -0.036 0.04360
You see that this gives you all coefficients and confidence intervals in one nice dataframe, which makes plotting with ggplot
easier. 您会看到,这在一个不错的数据帧中为您提供了所有系数和置信区间,这使使用
ggplot
进行绘制变得更加容易。 For instance, if your datasets have identical content, you could add a year identifier to them (eg df1$year <- 2000; df2$year <- 2001
etc), and bind them together afterwards (eg using bind_rows
, of you can use bind_rows
's .id
option). 例如,如果您的数据集具有相同的内容,则可以向它们添加年份标识符(例如
df1$year <- 2000; df2$year <- 2001
bind_rows
等),然后将它们绑定在一起(例如,使用bind_rows
,可以使用bind_rows
的.id
选项)。 Then you can use the year identifer instead of cyl
in the above example. 然后,在上面的示例中,您可以使用年份标识符而不是
cyl
。
The plotting then is simple. 这样绘制就很简单。 To use the
mtcars
data again, let's plot the coefficients for disp
only (though you could also use faceting
, grouping
, etc): 要再次使用
mtcars
数据,让我们仅绘制disp
的系数(尽管您也可以使用faceting
, grouping
等):
ggplot(filter(models, term=="disp"), aes(x=cyl, y=estimate)) +
geom_point() + geom_errorbar(aes(ymin=conf.low, ymax=conf.high))
To use your data: 要使用您的数据:
df <- bind_rows(df16, df15, .id = "years")
models <- df %>% group_by(years) %>%
do(data.frame(tidy(lm(y ~ x1+t, data = .),conf.int=T ))) %>%
filter(term == "t") %>%
ggplot(aes(x=years, y=estimate)) + geom_point() +
geom_errorbar(aes(ymin=conf.low, ymax=conf.high))
Note that you can easily add more and more models just by binding more and more data to the main dataframe. 请注意,仅通过将越来越多的数据绑定到主数据框就可以轻松地添加越来越多的模型。 You can also easily use
faceting
, grouping
or position- dodge
ing to adjust the look of the corresponding plot if you want to plot more than one coefficient. 如果要绘制多个系数,则还可以轻松使用
faceting
, grouping
或位置dodge
来调整相应图的外观。
This is the solution I have right now: 这是我现在拥有的解决方案:
gen_df_plot <- function(reg, coef_name){
df <- data.frame(y = reg$coefficients[[coef_name]],
lb = confint(reg, coef_name, level = 0.95)[1],
ub = confint(reg, coef_name, level = 0.95)[2])
return(df)
}
df.plot <- lapply(list(lm15,lm16), gen_df_plot, coef_name = 't')
df.plot <- data.table::rbindlist(df.plot)
df.plot$x <- as.factor(c(2015, 2016))
df.plot %>% ggplot(aes(x, y)) + geom_point(size=4) +
geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1, linetype="dotted") +
geom_hline(aes(yintercept=0), linetype="dashed") + theme_bw()
I don't love it, but it works. 我不喜欢它,但是它有效。
Here is what might be generalized code. 这是通用代码。 I have made a change to how "x" is defined so that you don't have to worry about alphabetic reordering of the factor.
我对“ x”的定义方式进行了更改,因此您不必担心该因素的字母重新排序。
#
# Paul Gronke and Paul Manson
# Early Voting Information Center at Reed College
#
# August 27, 2019
#
#
# Code to plot a single coefficient from multiple models, provided
# as an easier alternative to "coefplot" and "dotwhisker". Some users
# may find those packages more capable
#
# Code adapted from https://stackoverflow.com/questions/35582052/plot-regression-coefficient-with-confidence-intervals
# gen_df_plot function will create a tidy data frame for your plot
# Currently set up to display 95% confidence intervals
gen_df_plot <- function(reg, coef_name){
df <- data.frame(y = reg$coefficients[[coef_name]],
lb = confint(reg, coef_name, level = 0.95)[1],
ub = confint(reg, coef_name, level = 0.95)[2])
return(df)
}
# Populate the data frame with a list of your model results.
df.plot <- lapply(list(model1, # List your models here
model2),
gen_df_plot,
coef_name = 'x1') # Coefficient name
# Convert the list to a tidy data frame
df.plot <- data.table::rbindlist(df.plot)
# Provide the coefficient or regression labels below, in the
# order that you want them to appear. The "levels=unique(.)" parameter
# overrides R's desire to order the factor alphabetically
df.plot$x <- c("Group 1",
"Group 2") %>%
factor(., levels = unique(.),
ordered = TRUE)
# Create your plot
df.plot %>% ggplot(aes(x, y)) +
geom_point(size=4) +
geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1, linetype="dotted") +
geom_hline(aes(yintercept=0), linetype="dashed") +
theme_bw() +
ggtitle("Comparing Coefficients") +
ylab("Coefficient Value")```
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.