简体   繁体   中英

dplyr and ggplot2: Regression coefficients plot using multiple results in list

(..I added a toy example below..)
I would like to make a coefficients plot like this using multiple linear regression results in a list.

Firstly, results are stored in lmRes_set :

> lmRes_set
Source: local data frame [5 x 3]
Groups: <by row>

# A tibble: 5 × 3
   iter            lmResRand            lmResBias
* <dbl>               <list>               <list>
1     1 <data.frame [7 × 5]> <data.frame [7 × 5]>
2     2 <data.frame [6 × 5]> <data.frame [6 × 5]>
3     3 <data.frame [6 × 5]> <data.frame [6 × 5]>

lmResRand and lmResBias are different models, so I would like to draw a figure for each model. iter is the id for dataset, which means I applied the same model to different datasets.

I started with lmResRand . Each result looks like this (I used tidy() for lm() output, below is the first two):

> lmRes_set[[2]]
[[1]]
              term    estimate  std.error   statistic    p.value
1      (Intercept) 44.08651614 12.7941054  3.44584594 0.04106068
2   treatment_rand -1.00512172 10.4939360 -0.09578119 0.92973389
3              age  0.06060037  0.1618392  0.37444793 0.73297251
4 factor(partyID)4  3.32417091 10.8821091  0.30547120 0.77997631
5 factor(partyID)7 -1.10496660 11.2153538 -0.09852267 0.92773124
6 factor(partyID)8 23.71469512 10.9822559  2.15936465 0.11964723
7           gender  9.75602334  7.2412885  1.34727726 0.27061498

[[2]]
              term   estimate  std.error  statistic    p.value
1      (Intercept) 77.2206176 19.0869356  4.0457315 0.01553032
2   treatment_rand -7.9928511 12.0717024 -0.6621147 0.54409765
3              age -0.4003288  0.2493055 -1.6057762 0.18359582
4 factor(partyID)5 31.5800803 12.0126701  2.6288976 0.05825344
5 factor(partyID)7  6.1992841  6.8926315  0.8994074 0.41928587
6           gender  1.0162926  7.5365958  0.1348477 0.89924554

I would like to plot treatment_rand with 95% confidence interval.

I was not sure where to start with, so I firstly tried following code:

> lmRes_set %>% 
+     group_by(iter) %>%
+     select(lmResBias)
Adding missing grouping variables: `iter`
Source: local data frame [5 x 2]
Groups: iter [5]

   iter            lmResBias
* <dbl>               <list>
1     1 <data.frame [7 × 5]>
2     2 <data.frame [6 × 5]>
3     3 <data.frame [6 × 5]>
4     4 <data.frame [7 × 5]>
5     5 <data.frame [5 × 5]>
Warning message:
Grouping rowwise data frame strips rowwise nature

It seems I could not access to the element in the list.

Added Section: toy example
Here is a simpler version of the original code. This creates a simple dataset, but encounters the similar problem when creating a plot like this (point estimates with 95% confidence intervals). For example, I want to plot three (there are three categories in setid ) coefficients of age in model1 on the same plot.

library(tidyverse);library(broom)
# Data Creation
population_num <- 50
population <- tibble(
    gender = as.numeric(rbinom(population_num, 1, 0.5)),
    age=rnorm(population_num, mean=50, sd=20),
    score=rnorm(population_num, mean=80, sd=30),
    setid=sample(c(1,2,3), size=population_num, replace=T)
    )
# Analysis
lmRes_set <- population %>%
    group_by(setid) %>%
    do(model1=tidy(lm(score~age, data=.)),
         model2=tidy(lm(score~age+gender, data=.)))
# Make Plot for model 1 # Error!!!
temp <- lmRes_set %>%
    group_by(setid) %>%
    select(model1)

I posted a similar question in Japanese stackoverflow and @cuttlefish44 gave me a wonderful answer. I translated his/her answer into English.


lmRes_set is not "long" format, which makes it difficult to use in the later part. We need to convert the result into "long" format in the beginning.

temp <- population %>%
  group_by(setid) %>%
  do(model1 = tidy(lm(score ~ age, data = .)),
     model2 = tidy(lm(score ~ age + gender, data = .))) %>%   ## Same as question
  gather(model_name, model, -setid) %>%                        ## make it long format
  unnest() %>%                                                 ## gather
  filter(term == "age")                                        ## select a variable


  ## same as the website linked in the question (some parts are skipped)
interval1 <- -qnorm((1-0.9)/2)

ggplot(temp, aes(colour = as.factor(setid))) +
  geom_hline(yintercept = 0, colour = gray(1/2), lty = 2) +
  geom_linerange(aes(x = model_name, ymin = estimate - std.error*interval1,
                     ymax = estimate + std.error*interval1),
                 lwd = 1, position = position_dodge(width = 1/2)) +
  coord_flip()

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM