简体   繁体   English

使用ggplot2在R中的分组箱图上分组散点图

[英]Grouped scatterplot over grouped boxplot in R using ggplot2

I am creating a grouped boxplot with a scatterplot overlay using ggplot2. 我正在使用ggplot2创建具有散点图覆盖的分组箱线图。 I would like to group each scatterplot datapoint with the grouped boxplot that it corresponds to. 我想将每个散点图数据点与其对应的分组箱图进行分组。

However, I'd also like the scatterplot points to be different symbols. 但是,我也希望散点图是不同的符号。 I seem to be able to get my scatterplot points to group with my grouped boxplots OR get my scatterplot points to be different symbols... but not both simultaneously. 我似乎能够将散布图点与分组的箱线图进行分组,或者使散布图点成为不同的符号……但不能同时使用。 Below is some example code to illustrate what's happening: 下面是一些示例代码来说明正在发生的事情:

library(scales)
library(ggplot2) 

# Generates Data frame to plot
Gene <- c(rep("GeneA",24),rep("GeneB",24),rep("GeneC",24),rep("GeneD",24),rep("GeneE",24))
Clone <- c(rep(c("D1","D2","D3","D4","D5","D6"),20))
variable <- c(rep(c(rep("Day10",6),rep("Day20",6),rep("Day30",6),rep("Day40",6)),5))
value <- c(rnorm(24, mean = 0.5, sd = 0.5),rnorm(24, mean = 10, sd = 8),rnorm(24, mean = 1000, sd = 900), 
           rnorm(24, mean = 25000, sd = 9000), rnorm(24, mean = 8000, sd = 3000))
    value <- sqrt(value*value)
        Tdata <- cbind(Gene, Clone, variable)
        Tdata <- data.frame(Tdata)
            Tdata <- cbind(Tdata,value)

# Creates the Plot of All Data
# The below code groups the data exactly how I'd like but the scatter plot points are all the same shape
# and I'd like them to each have different shapes.                        
ln_clr <- "black"
bk_clr <- "white"
point_shapes <- c(0,15,1,16,2,17)
blue_cols <- c("#EFF2FB","#81BEF7","#0174DF","#0000FF","#0404B4")

lp1 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
    stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25, 
                 size = 0.7, coef = 4) +
    geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3, 
                  alpha = 1, colour = ln_clr) +
    geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7, 
               pch=15)


lp1 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
    expand_limits(y=c(0.01,10^5)) +
    scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
                  labels = trans_format("log10", math_format(10^.x)))

ggsave("Scatter Grouped-Wrong Symbols.png")

#*************************************************************************************************************************************
# The below code doesn't group the scatterplot data how I'd like but the points each have different shapes
lp2 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
    stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25, 
                 size = 0.7, coef = 4) +
    geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3, 
                  alpha = 1, colour = ln_clr) +
    geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7, 
               aes(shape=Clone))


lp2 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
    expand_limits(y=c(0.01,10^5)) +
    scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
                  labels = trans_format("log10", math_format(10^.x)))

ggsave("Scatter Ungrouped-Right Symbols.png")

If anyone has any suggestions I'd really appreciate it. 如果有人有任何建议,我将不胜感激。

Thank you Nathan 谢谢内森

To get the boxplots to appear, the shape aesthetic needs to be inside geom_point , rather than in the main call to ggplot. 为了使geom_point shape图显示出来, shape美感必须在geom_point ,而不是在ggplot的主调用中。 The reason for this is that when the shape aesthetic is in the main ggplot call, it applies to all the geoms, including geom_boxplot . 原因是当在ggplot主调用中使用shape美感时,它将应用于所有几何体,包括geom_boxplot However, applying a shape=Clone aesthetic causes geom_boxplot to create a separate boxplot for each level of Clone . 但是,应用shape=Clone美观会使geom_boxplot为每个Clone级别创建一个单独的箱线图。 Since there's only one row of data for each combination of variable and Clone , no boxplot is produced. 由于variableClone每种组合只有一行数据,因此不会产生箱线图。

That the shape aesthetic affects geom_boxplot seems counterintuitive to me, but maybe there's a reason for it that I'm not aware of. shape美学影响geom_boxplot对我来说似乎是违反直觉的,但也许有一个我不知道的原因。 In any case, moving the shape aesthetic into geom_point solves the problem by applying the shape aesthetic only to geom_point . 在任何情况下,移动所述shape审美成geom_point通过应用解决了这个问题shape审美只geom_point

Then, to get the points to appear with the correct boxplot, we need to group by Gene . 然后,为了使点以正确的箱线图出现,我们需要按Gene group I also added theme_classic to make it easier to see the plot (although it's still very busy): 我还添加了theme_classic以使它更易于查看图(尽管它仍然很忙):

ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
  stat_boxplot(geom ='errorbar', width=0.25, size=0.7, coef=4, position=position_dodge(0.85)) +
  geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, position=position_dodge(0.85)) +
  geom_point(position=position_jitterdodge(dodge.width=0.85), size=1.8, alpha=0.7, 
             aes(shape=Clone, group=Gene)) +
  scale_fill_manual(values=blue_cols) + labs(y="Fold Change") +
  expand_limits(y=c(0.01,10^5)) +
  scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
                labels=trans_format("log10", math_format(10^.x))) +
  theme_classic()

在此处输入图片说明

I think the plot would be easier to understand if you use faceting for Gene and the x-axis for variable . 我认为,如果将Facet用作Gene并将x轴用作variable则该图将更易于理解。 Putting time on the x-axis seems more intuitive, while using facetting frees up the color aesthetic for the points. 将时间放在x轴上似乎更直观,而使用多面可释放点的颜色美感。 With six different clones, it's still difficult (for me at least) to differentiate the point markers, but this looks cleaner to me than the previous version. 使用六个不同的克隆,仍然很难(至少对我而言)区分点标记,但是对我来说,这比以前的版本更干净。

library(dplyr)

ggplot(Tdata %>% mutate(Gene=gsub("Gene","Gene ", Gene)), 
       aes(x=gsub("Day","",variable), y=value)) +
  stat_boxplot(geom='errorbar', width=0.25, size=0.7, coef=4) +
  geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, width=0.5) +
  geom_point(aes(fill=Clone), position=position_jitter(0.2), size=1.5, alpha=0.7, shape=21) +
  theme_classic() +
  facet_grid(. ~ Gene) +
  labs(y = "Fold Change", x="Day") +
  expand_limits(y=c(0.01,10^5)) +
  scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
                labels=trans_format("log10", math_format(10^.x)))

在此处输入图片说明

If you really need to keep the points, maybe it would be better to separate the boxplots and points with some manual dodging: 如果您确实需要保留这些点,也许最好通过一些手动躲避将箱线图和点分开:

set.seed(10)
ggplot(Tdata %>% mutate(Day=as.numeric(substr(variable,4,5)),
                        Gene = gsub("Gene","Gene ", Gene)), 
       aes(x=Day - 2, y=value, group=Day)) +
  stat_boxplot(geom ='errorbar', width=0.5, size=0.5, coef=4) +
  geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, width=4) +
  geom_point(aes(x=Day + 2, fill=Clone), size=1.5, alpha=0.7, shape=21,
             position=position_jitter(width=1, height=0)) +
  theme_classic() +
  facet_grid(. ~ Gene) +
  labs(y="Fold Change", x="Day") +
  expand_limits(y=c(0.01,10^5)) +
  scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
                labels=trans_format("log10", math_format(10^.x)))

在此处输入图片说明

One more thing: For future reference, you can simplify your data creation code: 还有一件事:为了将来参考,您可以简化数据创建代码:

Gene = rep(paste0("Gene",LETTERS[1:5]), each=24)
Clone = rep(paste0("D",1:6), 20)
variable = rep(rep(paste0("Day", seq(10,40,10)), each=6), 5)
value = rnorm(24*5, mean=rep(c(0.5,10,1000,25000,8000), each=24), 
              sd=rep(c(0.5,8,900,9000,3000), each=24))

Tdata = data.frame(Gene, Clone, variable, value)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM