繁体   English   中英

使用ggplot2在R中的分组箱图上分组散点图

[英]Grouped scatterplot over grouped boxplot in R using ggplot2

我正在使用ggplot2创建具有散点图覆盖的分组箱线图。 我想将每个散点图数据点与其对应的分组箱图进行分组。

但是,我也希望散点图是不同的符号。 我似乎能够将散布图点与分组的箱线图进行分组,或者使散布图点成为不同的符号……但不能同时使用。 下面是一些示例代码来说明正在发生的事情:

library(scales)
library(ggplot2) 

# Generates Data frame to plot
Gene <- c(rep("GeneA",24),rep("GeneB",24),rep("GeneC",24),rep("GeneD",24),rep("GeneE",24))
Clone <- c(rep(c("D1","D2","D3","D4","D5","D6"),20))
variable <- c(rep(c(rep("Day10",6),rep("Day20",6),rep("Day30",6),rep("Day40",6)),5))
value <- c(rnorm(24, mean = 0.5, sd = 0.5),rnorm(24, mean = 10, sd = 8),rnorm(24, mean = 1000, sd = 900), 
           rnorm(24, mean = 25000, sd = 9000), rnorm(24, mean = 8000, sd = 3000))
    value <- sqrt(value*value)
        Tdata <- cbind(Gene, Clone, variable)
        Tdata <- data.frame(Tdata)
            Tdata <- cbind(Tdata,value)

# Creates the Plot of All Data
# The below code groups the data exactly how I'd like but the scatter plot points are all the same shape
# and I'd like them to each have different shapes.                        
ln_clr <- "black"
bk_clr <- "white"
point_shapes <- c(0,15,1,16,2,17)
blue_cols <- c("#EFF2FB","#81BEF7","#0174DF","#0000FF","#0404B4")

lp1 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
    stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25, 
                 size = 0.7, coef = 4) +
    geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3, 
                  alpha = 1, colour = ln_clr) +
    geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7, 
               pch=15)


lp1 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
    expand_limits(y=c(0.01,10^5)) +
    scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
                  labels = trans_format("log10", math_format(10^.x)))

ggsave("Scatter Grouped-Wrong Symbols.png")

#*************************************************************************************************************************************
# The below code doesn't group the scatterplot data how I'd like but the points each have different shapes
lp2 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
    stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25, 
                 size = 0.7, coef = 4) +
    geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3, 
                  alpha = 1, colour = ln_clr) +
    geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7, 
               aes(shape=Clone))


lp2 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
    expand_limits(y=c(0.01,10^5)) +
    scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
                  labels = trans_format("log10", math_format(10^.x)))

ggsave("Scatter Ungrouped-Right Symbols.png")

如果有人有任何建议,我将不胜感激。

谢谢内森

为了使geom_point shape图显示出来, shape美感必须在geom_point ,而不是在ggplot的主调用中。 原因是当在ggplot主调用中使用shape美感时,它将应用于所有几何体,包括geom_boxplot 但是,应用shape=Clone美观会使geom_boxplot为每个Clone级别创建一个单独的箱线图。 由于variableClone每种组合只有一行数据,因此不会产生箱线图。

shape美学影响geom_boxplot对我来说似乎是违反直觉的,但也许有一个我不知道的原因。 在任何情况下,移动所述shape审美成geom_point通过应用解决了这个问题shape审美只geom_point

然后,为了使点以正确的箱线图出现,我们需要按Gene group 我还添加了theme_classic以使它更易于查看图(尽管它仍然很忙):

ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
  stat_boxplot(geom ='errorbar', width=0.25, size=0.7, coef=4, position=position_dodge(0.85)) +
  geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, position=position_dodge(0.85)) +
  geom_point(position=position_jitterdodge(dodge.width=0.85), size=1.8, alpha=0.7, 
             aes(shape=Clone, group=Gene)) +
  scale_fill_manual(values=blue_cols) + labs(y="Fold Change") +
  expand_limits(y=c(0.01,10^5)) +
  scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
                labels=trans_format("log10", math_format(10^.x))) +
  theme_classic()

在此处输入图片说明

我认为,如果将Facet用作Gene并将x轴用作variable则该图将更易于理解。 将时间放在x轴上似乎更直观,而使用多面可释放点的颜色美感。 使用六个不同的克隆,仍然很难(至少对我而言)区分点标记,但是对我来说,这比以前的版本更干净。

library(dplyr)

ggplot(Tdata %>% mutate(Gene=gsub("Gene","Gene ", Gene)), 
       aes(x=gsub("Day","",variable), y=value)) +
  stat_boxplot(geom='errorbar', width=0.25, size=0.7, coef=4) +
  geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, width=0.5) +
  geom_point(aes(fill=Clone), position=position_jitter(0.2), size=1.5, alpha=0.7, shape=21) +
  theme_classic() +
  facet_grid(. ~ Gene) +
  labs(y = "Fold Change", x="Day") +
  expand_limits(y=c(0.01,10^5)) +
  scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
                labels=trans_format("log10", math_format(10^.x)))

在此处输入图片说明

如果您确实需要保留这些点,也许最好通过一些手动躲避将箱线图和点分开:

set.seed(10)
ggplot(Tdata %>% mutate(Day=as.numeric(substr(variable,4,5)),
                        Gene = gsub("Gene","Gene ", Gene)), 
       aes(x=Day - 2, y=value, group=Day)) +
  stat_boxplot(geom ='errorbar', width=0.5, size=0.5, coef=4) +
  geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, width=4) +
  geom_point(aes(x=Day + 2, fill=Clone), size=1.5, alpha=0.7, shape=21,
             position=position_jitter(width=1, height=0)) +
  theme_classic() +
  facet_grid(. ~ Gene) +
  labs(y="Fold Change", x="Day") +
  expand_limits(y=c(0.01,10^5)) +
  scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
                labels=trans_format("log10", math_format(10^.x)))

在此处输入图片说明

还有一件事:为了将来参考,您可以简化数据创建代码:

Gene = rep(paste0("Gene",LETTERS[1:5]), each=24)
Clone = rep(paste0("D",1:6), 20)
variable = rep(rep(paste0("Day", seq(10,40,10)), each=6), 5)
value = rnorm(24*5, mean=rep(c(0.5,10,1000,25000,8000), each=24), 
              sd=rep(c(0.5,8,900,9000,3000), each=24))

Tdata = data.frame(Gene, Clone, variable, value)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM