[英]Visualising the effect of a continuous predictor on a dichotomous outcome using ggplot2
My dataset has two columns.我的数据集有两列。 Column 1: a dichotomous variable - 'Yes' or No'.
第 1 列:一个二分变量——“是”或“否”。 Column 2: a continuous predictor, which ranges from 3 - 6.
第 2 列:连续预测变量,范围为 3 - 6。
In base R, I can easily visualise the effect of this continuous predictor on the probability of achieving said dichotomous outcome, by simply using plot(outcome~predictor)
.在基础 R 中,我可以通过简单地使用
plot(outcome~predictor)
轻松地可视化这个连续预测器对实现所述二分结果的概率的影响。 If I do so, I get a graph that looks something like this:如果这样做,我会得到一个看起来像这样的图表:
I am unable to replicate this type of plot using ggplot2, nor find any examples of other people using what looks like to me a simple way to visualise the data.我无法使用 ggplot2 复制这种类型的 plot,也无法找到任何其他人使用在我看来是可视化数据的简单方法的示例。 If anyone would be able to explain how I can produce this plot using ggplot2 I'd be most grateful.
如果有人能够解释我如何使用 ggplot2 生产这个 plot,我将不胜感激。
You could approach this using geom_rect
as follows:您可以使用
geom_rect
来解决这个问题,如下所示:
First, some toy data:首先,一些玩具数据:
x <- runif(1000)
y <- rbinom(1000,1,0.2)
df <- data.frame(x,y)
Now make a new dataframe that includes the coordinates of each rectangle.现在制作一个包含每个矩形坐标的新 dataframe。 You'll need to define how to break up the axis, you could do it evenly, use quantiles, whatever.. I've chosen some arbitrary values:
你需要定义如何分解轴,你可以均匀地做,使用分位数等等。我选择了一些任意值:
limits <- c(0,.3,.9,1)
Now I can find the proportion I want for each group:现在我可以找到我想要的每个组的比例:
df$xcut <- cut(x, c(0,.3,.9,1))
df2 <- aggregate(data=df, y~xcut, mean)
df2$max <- limits[-1]
df2$min <- limits[-(length(limits))]
df2
xcut y max min
1 (0,0.3] 0.2052980 0.3 0.0
2 (0.3,0.9] 0.2128378 0.9 0.3
3 (0.9,1] 0.2358491 1.0 0.9
Now you have everything you need for geom_rect
现在您拥有
geom_rect
所需的一切
ggplot(df2) + geom_rect(aes(xmin=min,xmax=max, ymin=0, ymax=y ), fill="white", col="black") +
labs(y="Proportion",x="x") +
scale_x_continuous(breaks=limits)
You can tweak the y
axis scale and add the 'no' boxes to get the effect you want although that seems a bit redundant.您可以调整
y
轴刻度并添加“否”框以获得您想要的效果,尽管这似乎有点多余。
Here is a R base and ggplot solution.这是一个 R 基础和 ggplot 解决方案。 First we create some data
首先我们创建一些数据
set.seed(1)
df <- data.frame(Predictor= round(rnorm(10000, 5, 2), 0),
Dichotomous_outcome= factor(sample(c("Yes", "No"), 10000, replace= TRUE)))
Then we table the binary variable for the predictor and calculate the fractions然后我们列出预测变量的二进制变量并计算分数
df_table <- aggregate(Dichotomous_outcome ~ Predictor, df, table)
df_table$Yes_fraction <- df_table$Dichotomous_outcome[, "Yes"]/ rowSums(df_table$Dichotomous_outcome)
df_table$No_fraction <- df_table$Dichotomous_outcome[, "No"]/ rowSums(df_table$Dichotomous_outcome)
df_table <- df_table[order(df_table$Predictor), ]
Now we transform the dataframe so that we can make a rectangle现在我们变换 dataframe 以便我们可以制作一个矩形
df_rect <- data.frame(x_min= rep(df_table$Predictor[1:(nrow(df_table)-1)], 2),
x_max= rep(df_table$Predictor[2:(nrow(df_table))], 2),
y_min= c(rep(0, nrow(df_table)-1), df_table$Yes_fraction[-1]),
y_max= c(df_table$Yes_fraction[-1], rep(1, nrow(df_table)-1)),
col= rep(c("Yes", "No"), each= nrow(df_table)-1))
Now we can plot it现在我们可以 plot 它
library(ggplot2)
ggplot(df_rect) +
geom_rect(aes(xmin= x_min, xmax= x_max, ymin= y_min, ymax= y_max, fill= col), col= "black") +
labs(x= "Predictor", y= "Dichotomuous Outcome") +
scale_y_continuous(breaks= c(.25, .75), labels= c("Yes", "No"))
Perhaps the ggmosaic package can be adapted to suit your needs?也许ggmosaic package可以适应您的需求? Eg
例如
library(tidyverse)
#install.packages("ggmosaic")
library(ggmosaic)
df <- data.frame(dichot = sample(c("Yes", "No"), 25, replace = TRUE),
contin = sample(1:6, 25, replace = TRUE))
ggplot(df) +
geom_mosaic(aes(x = product(contin), fill = dichot))
Created on 2021-11-24 by the reprex package (v2.0.1)由代表 package (v2.0.1) 于 2021 年 11 月 24 日创建
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.