簡體   English   中英

使用ggplot2在R中的分組箱圖上分組散點圖

[英]Grouped scatterplot over grouped boxplot in R using ggplot2

我正在使用ggplot2創建具有散點圖覆蓋的分組箱線圖。 我想將每個散點圖數據點與其對應的分組箱圖進行分組。

但是,我也希望散點圖是不同的符號。 我似乎能夠將散布圖點與分組的箱線圖進行分組,或者使散布圖點成為不同的符號……但不能同時使用。 下面是一些示例代碼來說明正在發生的事情:

library(scales)
library(ggplot2) 

# Generates Data frame to plot
Gene <- c(rep("GeneA",24),rep("GeneB",24),rep("GeneC",24),rep("GeneD",24),rep("GeneE",24))
Clone <- c(rep(c("D1","D2","D3","D4","D5","D6"),20))
variable <- c(rep(c(rep("Day10",6),rep("Day20",6),rep("Day30",6),rep("Day40",6)),5))
value <- c(rnorm(24, mean = 0.5, sd = 0.5),rnorm(24, mean = 10, sd = 8),rnorm(24, mean = 1000, sd = 900), 
           rnorm(24, mean = 25000, sd = 9000), rnorm(24, mean = 8000, sd = 3000))
    value <- sqrt(value*value)
        Tdata <- cbind(Gene, Clone, variable)
        Tdata <- data.frame(Tdata)
            Tdata <- cbind(Tdata,value)

# Creates the Plot of All Data
# The below code groups the data exactly how I'd like but the scatter plot points are all the same shape
# and I'd like them to each have different shapes.                        
ln_clr <- "black"
bk_clr <- "white"
point_shapes <- c(0,15,1,16,2,17)
blue_cols <- c("#EFF2FB","#81BEF7","#0174DF","#0000FF","#0404B4")

lp1 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
    stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25, 
                 size = 0.7, coef = 4) +
    geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3, 
                  alpha = 1, colour = ln_clr) +
    geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7, 
               pch=15)


lp1 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
    expand_limits(y=c(0.01,10^5)) +
    scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
                  labels = trans_format("log10", math_format(10^.x)))

ggsave("Scatter Grouped-Wrong Symbols.png")

#*************************************************************************************************************************************
# The below code doesn't group the scatterplot data how I'd like but the points each have different shapes
lp2 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
    stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25, 
                 size = 0.7, coef = 4) +
    geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3, 
                  alpha = 1, colour = ln_clr) +
    geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7, 
               aes(shape=Clone))


lp2 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
    expand_limits(y=c(0.01,10^5)) +
    scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
                  labels = trans_format("log10", math_format(10^.x)))

ggsave("Scatter Ungrouped-Right Symbols.png")

如果有人有任何建議,我將不勝感激。

謝謝內森

為了使geom_point shape圖顯示出來, shape美感必須在geom_point ,而不是在ggplot的主調用中。 原因是當在ggplot主調用中使用shape美感時,它將應用於所有幾何體,包括geom_boxplot 但是,應用shape=Clone美觀會使geom_boxplot為每個Clone級別創建一個單獨的箱線圖。 由於variableClone每種組合只有一行數據,因此不會產生箱線圖。

shape美學影響geom_boxplot對我來說似乎是違反直覺的,但也許有一個我不知道的原因。 在任何情況下,移動所述shape審美成geom_point通過應用解決了這個問題shape審美只geom_point

然后,為了使點以正確的箱線圖出現,我們需要按Gene group 我還添加了theme_classic以使它更易於查看圖(盡管它仍然很忙):

ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
  stat_boxplot(geom ='errorbar', width=0.25, size=0.7, coef=4, position=position_dodge(0.85)) +
  geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, position=position_dodge(0.85)) +
  geom_point(position=position_jitterdodge(dodge.width=0.85), size=1.8, alpha=0.7, 
             aes(shape=Clone, group=Gene)) +
  scale_fill_manual(values=blue_cols) + labs(y="Fold Change") +
  expand_limits(y=c(0.01,10^5)) +
  scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
                labels=trans_format("log10", math_format(10^.x))) +
  theme_classic()

在此處輸入圖片說明

我認為,如果將Facet用作Gene並將x軸用作variable則該圖將更易於理解。 將時間放在x軸上似乎更直觀,而使用多面可釋放點的顏色美感。 使用六個不同的克隆,仍然很難(至少對我而言)區分點標記,但是對我來說,這比以前的版本更干凈。

library(dplyr)

ggplot(Tdata %>% mutate(Gene=gsub("Gene","Gene ", Gene)), 
       aes(x=gsub("Day","",variable), y=value)) +
  stat_boxplot(geom='errorbar', width=0.25, size=0.7, coef=4) +
  geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, width=0.5) +
  geom_point(aes(fill=Clone), position=position_jitter(0.2), size=1.5, alpha=0.7, shape=21) +
  theme_classic() +
  facet_grid(. ~ Gene) +
  labs(y = "Fold Change", x="Day") +
  expand_limits(y=c(0.01,10^5)) +
  scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
                labels=trans_format("log10", math_format(10^.x)))

在此處輸入圖片說明

如果您確實需要保留這些點,也許最好通過一些手動躲避將箱線圖和點分開:

set.seed(10)
ggplot(Tdata %>% mutate(Day=as.numeric(substr(variable,4,5)),
                        Gene = gsub("Gene","Gene ", Gene)), 
       aes(x=Day - 2, y=value, group=Day)) +
  stat_boxplot(geom ='errorbar', width=0.5, size=0.5, coef=4) +
  geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, width=4) +
  geom_point(aes(x=Day + 2, fill=Clone), size=1.5, alpha=0.7, shape=21,
             position=position_jitter(width=1, height=0)) +
  theme_classic() +
  facet_grid(. ~ Gene) +
  labs(y="Fold Change", x="Day") +
  expand_limits(y=c(0.01,10^5)) +
  scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
                labels=trans_format("log10", math_format(10^.x)))

在此處輸入圖片說明

還有一件事:為了將來參考,您可以簡化數據創建代碼:

Gene = rep(paste0("Gene",LETTERS[1:5]), each=24)
Clone = rep(paste0("D",1:6), 20)
variable = rep(rep(paste0("Day", seq(10,40,10)), each=6), 5)
value = rnorm(24*5, mean=rep(c(0.5,10,1000,25000,8000), each=24), 
              sd=rep(c(0.5,8,900,9000,3000), each=24))

Tdata = data.frame(Gene, Clone, variable, value)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM