简体   繁体   中英

Visualising the effect of a continuous predictor on a dichotomous outcome using ggplot2

My dataset has two columns. Column 1: a dichotomous variable - 'Yes' or No'. Column 2: a continuous predictor, which ranges from 3 - 6.

In base R, I can easily visualise the effect of this continuous predictor on the probability of achieving said dichotomous outcome, by simply using plot(outcome~predictor) . If I do so, I get a graph that looks something like this:

基 r 图

I am unable to replicate this type of plot using ggplot2, nor find any examples of other people using what looks like to me a simple way to visualise the data. If anyone would be able to explain how I can produce this plot using ggplot2 I'd be most grateful.

You could approach this using geom_rect as follows:

First, some toy data:

x <- runif(1000)
y <- rbinom(1000,1,0.2)
df <- data.frame(x,y)

Now make a new dataframe that includes the coordinates of each rectangle. You'll need to define how to break up the axis, you could do it evenly, use quantiles, whatever.. I've chosen some arbitrary values:

limits <- c(0,.3,.9,1)

Now I can find the proportion I want for each group:

df$xcut <- cut(x, c(0,.3,.9,1))
df2 <- aggregate(data=df, y~xcut, mean)
df2$max <- limits[-1]
df2$min <- limits[-(length(limits))]
df2

       xcut         y max min
1   (0,0.3] 0.2052980 0.3 0.0
2 (0.3,0.9] 0.2128378 0.9 0.3
3   (0.9,1] 0.2358491 1.0 0.9

Now you have everything you need for geom_rect

ggplot(df2) + geom_rect(aes(xmin=min,xmax=max, ymin=0, ymax=y ), fill="white", col="black") + 
  labs(y="Proportion",x="x") + 
  scale_x_continuous(breaks=limits)

在此处输入图像描述

You can tweak the y axis scale and add the 'no' boxes to get the effect you want although that seems a bit redundant.

Here is a R base and ggplot solution. First we create some data

set.seed(1)
df <- data.frame(Predictor= round(rnorm(10000, 5, 2), 0),
             Dichotomous_outcome= factor(sample(c("Yes", "No"), 10000, replace= TRUE)))

Then we table the binary variable for the predictor and calculate the fractions

df_table <- aggregate(Dichotomous_outcome ~ Predictor, df, table)

df_table$Yes_fraction <- df_table$Dichotomous_outcome[, "Yes"]/ rowSums(df_table$Dichotomous_outcome)
df_table$No_fraction <- df_table$Dichotomous_outcome[, "No"]/ rowSums(df_table$Dichotomous_outcome)
df_table <- df_table[order(df_table$Predictor), ]

Now we transform the dataframe so that we can make a rectangle

df_rect <- data.frame(x_min= rep(df_table$Predictor[1:(nrow(df_table)-1)], 2),
                      x_max= rep(df_table$Predictor[2:(nrow(df_table))], 2),
                      y_min= c(rep(0, nrow(df_table)-1), df_table$Yes_fraction[-1]),
                      y_max= c(df_table$Yes_fraction[-1], rep(1, nrow(df_table)-1)),
                      col= rep(c("Yes", "No"), each= nrow(df_table)-1))

Now we can plot it

library(ggplot2)
ggplot(df_rect) +
  geom_rect(aes(xmin= x_min, xmax= x_max, ymin= y_min, ymax= y_max, fill= col), col= "black") +
labs(x= "Predictor", y= "Dichotomuous Outcome") +
  scale_y_continuous(breaks= c(.25, .75), labels= c("Yes", "No"))

阴谋

Perhaps the ggmosaic package can be adapted to suit your needs? Eg

library(tidyverse)
#install.packages("ggmosaic")
library(ggmosaic)

df <- data.frame(dichot = sample(c("Yes", "No"), 25, replace = TRUE),
                 contin = sample(1:6, 25, replace = TRUE))

ggplot(df) +
  geom_mosaic(aes(x = product(contin), fill = dichot))

Created on 2021-11-24 by the reprex package (v2.0.1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM