简体   繁体   English

使用 ggplot2 可视化连续预测变量对二分结果的影响

[英]Visualising the effect of a continuous predictor on a dichotomous outcome using ggplot2

My dataset has two columns.我的数据集有两列。 Column 1: a dichotomous variable - 'Yes' or No'.第 1 列:一个二分变量——“是”或“否”。 Column 2: a continuous predictor, which ranges from 3 - 6.第 2 列:连续预测变量,范围为 3 - 6。

In base R, I can easily visualise the effect of this continuous predictor on the probability of achieving said dichotomous outcome, by simply using plot(outcome~predictor) .在基础 R 中,我可以通过简单地使用plot(outcome~predictor)轻松地可视化这个连续预测器对实现所述二分结果的概率的影响。 If I do so, I get a graph that looks something like this:如果这样做,我会得到一个看起来像这样的图表:

基 r 图

I am unable to replicate this type of plot using ggplot2, nor find any examples of other people using what looks like to me a simple way to visualise the data.我无法使用 ggplot2 复制这种类型的 plot,也无法找到任何其他人使用在我看来是可视化数据的简单方法的示例。 If anyone would be able to explain how I can produce this plot using ggplot2 I'd be most grateful.如果有人能够解释我如何使用 ggplot2 生产这个 plot,我将不胜感激。

You could approach this using geom_rect as follows:您可以使用geom_rect来解决这个问题,如下所示:

First, some toy data:首先,一些玩具数据:

x <- runif(1000)
y <- rbinom(1000,1,0.2)
df <- data.frame(x,y)

Now make a new dataframe that includes the coordinates of each rectangle.现在制作一个包含每个矩形坐标的新 dataframe。 You'll need to define how to break up the axis, you could do it evenly, use quantiles, whatever.. I've chosen some arbitrary values:你需要定义如何分解轴,你可以均匀地做,使用分位数等等。我选择了一些任意值:

limits <- c(0,.3,.9,1)

Now I can find the proportion I want for each group:现在我可以找到我想要的每个组的比例:

df$xcut <- cut(x, c(0,.3,.9,1))
df2 <- aggregate(data=df, y~xcut, mean)
df2$max <- limits[-1]
df2$min <- limits[-(length(limits))]
df2

       xcut         y max min
1   (0,0.3] 0.2052980 0.3 0.0
2 (0.3,0.9] 0.2128378 0.9 0.3
3   (0.9,1] 0.2358491 1.0 0.9

Now you have everything you need for geom_rect现在您拥有geom_rect所需的一切

ggplot(df2) + geom_rect(aes(xmin=min,xmax=max, ymin=0, ymax=y ), fill="white", col="black") + 
  labs(y="Proportion",x="x") + 
  scale_x_continuous(breaks=limits)

在此处输入图像描述

You can tweak the y axis scale and add the 'no' boxes to get the effect you want although that seems a bit redundant.您可以调整y轴刻度并添加“否”框以获得您想要的效果,尽管这似乎有点多余。

Here is a R base and ggplot solution.这是一个 R 基础和 ggplot 解决方案。 First we create some data首先我们创建一些数据

set.seed(1)
df <- data.frame(Predictor= round(rnorm(10000, 5, 2), 0),
             Dichotomous_outcome= factor(sample(c("Yes", "No"), 10000, replace= TRUE)))

Then we table the binary variable for the predictor and calculate the fractions然后我们列出预测变量的二进制变量并计算分数

df_table <- aggregate(Dichotomous_outcome ~ Predictor, df, table)

df_table$Yes_fraction <- df_table$Dichotomous_outcome[, "Yes"]/ rowSums(df_table$Dichotomous_outcome)
df_table$No_fraction <- df_table$Dichotomous_outcome[, "No"]/ rowSums(df_table$Dichotomous_outcome)
df_table <- df_table[order(df_table$Predictor), ]

Now we transform the dataframe so that we can make a rectangle现在我们变换 dataframe 以便我们可以制作一个矩形

df_rect <- data.frame(x_min= rep(df_table$Predictor[1:(nrow(df_table)-1)], 2),
                      x_max= rep(df_table$Predictor[2:(nrow(df_table))], 2),
                      y_min= c(rep(0, nrow(df_table)-1), df_table$Yes_fraction[-1]),
                      y_max= c(df_table$Yes_fraction[-1], rep(1, nrow(df_table)-1)),
                      col= rep(c("Yes", "No"), each= nrow(df_table)-1))

Now we can plot it现在我们可以 plot 它

library(ggplot2)
ggplot(df_rect) +
  geom_rect(aes(xmin= x_min, xmax= x_max, ymin= y_min, ymax= y_max, fill= col), col= "black") +
labs(x= "Predictor", y= "Dichotomuous Outcome") +
  scale_y_continuous(breaks= c(.25, .75), labels= c("Yes", "No"))

阴谋

Perhaps the ggmosaic package can be adapted to suit your needs?也许ggmosaic package可以适应您的需求? Eg例如

library(tidyverse)
#install.packages("ggmosaic")
library(ggmosaic)

df <- data.frame(dichot = sample(c("Yes", "No"), 25, replace = TRUE),
                 contin = sample(1:6, 25, replace = TRUE))

ggplot(df) +
  geom_mosaic(aes(x = product(contin), fill = dichot))

Created on 2021-11-24 by the reprex package (v2.0.1)代表 package (v2.0.1) 于 2021 年 11 月 24 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 可视化连续预测变量与分类结果之间的关系 - Visualizing the relationship between a continuous predictor and a categorical outcome ggplot2缺少二分变量的美学 - ggplot2 missing aesthetics with dichotomous variable 使用 ggplot2 在条形图中绘制 2 个连续变量 - Plotting 2 continuous variables in barchart using ggplot2 使用效果包比较涉及ggplot2与base R的连续变量的交互效果图 - Comparing interaction effect plots involving continuous variables from ggplot2 vs. base R using the effects package 使用 ggplot2 的图形和文本的下划线效果 - Underline effect for graph and text using ggplot2 ggplot2 中二分变量的每边带有标签的镜像条形图 - Mirror bargraphs with labels on each side for dichotomous variables in ggplot2 对连续的预测变量进行分类并计算二进制结果的比例 - Categorize a continuous predictor variable and calculate proportion of binary outcome 使用 ggplot2 facet_grid 优化分类变量的绘图 - 仅绘制二分变量的两个值之一的比例 - Optimize plotting of categorical variables using ggplot2 facet_grid - plot proportion of only one of two values for dichotomous variables 如何在R中使用ggplot2创建具有连续比例的热图 - How to create a heatmap with continuous scale using ggplot2 in R ggplot2:使用scale_y_continuous时,图为空 - ggplot2: plot is empty when using scale_y_continuous
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM