简体   繁体   English

如何根据ggplot / R中的不同数据源将不同的箱图添加到同一个图中?

[英]How to add different boxplots to the same plot based on different data sources in ggplot /R?

Please find My Data below. 请在下面找到My Data Please note that picture below is an example of the design I wish to copy and does not correlate to My Data specifically. 请注意,下图是我希望复制的设计示例,并且与My Data无关。

My Data is stored in p . My Data存储在p I have a continuous covariate p$ki67pro which denominate the percentage of cells actively dividing in a tumor sample (thus, ranging from 0 to 100). 我有一个连续的协变量p$ki67pro ,它命名在肿瘤样本中活跃分裂的细胞百分比(因此,范围从0到100)。 I have three different stages of the tumor, which correspond to p$WHO.Grade==1,2,3 . 我有三个不同的肿瘤阶段,对应于p$WHO.Grade==1,2,3 Each sample represent a tumor patient that either had recurrence ( p$recurrence==1 ) or not ( p$recurrence==0 ). 每个样本代表肿瘤患者复发( p$recurrence==1 )或不p$recurrence==0p$recurrence==0 )。

Therefore: 因此:

head(p)
   WHO.Grade recurrence ki67pro
1          1          0       1
2          2          0      12
3          1          0       3
9          1          0       3
10         1          0       5
11         1          0       3

I wish to produce the boxplot below. 我想制作下面的箱线图。 As you can see, there are four points which correspond to each p$WHO.Grade and and All samples . 如您所见,有四个点对应于每个p$WHO.GradeAll samples There are two boxplots per p$WHO.Grade + All . p$WHO.Grade + All有两个p$WHO.Grade

在此输入图像描述

Per p$WHO.Grade and All , I want one boxplot to represent p$ki67pro for recurrent tumors ( p$recurrence==1 ) and the other boxplot to represent p$ki67pro for non-recurrent tumors ( p$recurrence==0 ). 按照p$WHO.GradeAll ,我想要一个箱图代表p$ki67pro用于复发性肿瘤( p$recurrence==1 ),另一个箱图代表p$ki67pro用于非复发性肿瘤( p$recurrence==0 )。

Ie

p$ki67pro[p$WHO.Grade==1 & p$recurrence==0] versus p$ki67pro[p$WHO.Grade==1 & p$recurrence==1] p$ki67pro[p$WHO.Grade==1 & p$recurrence==0]p$ki67pro[p$WHO.Grade==1 & p$recurrence==1]

p$ki67pro[p$WHO.Grade==2 & p$recurrence==0] versus p$ki67pro[p$WHO.Grade==2 & p$recurrence==1] p$ki67pro[p$WHO.Grade==2 & p$recurrence==0]p$ki67pro[p$WHO.Grade==2 & p$recurrence==1]

p$ki67pro[p$WHO.Grade==3 & p$recurrence==0] versus p$ki67pro[p$WHO.Grade==3 & p$recurrence==1] p$ki67pro[p$WHO.Grade==3 & p$recurrence==0]p$ki67pro[p$WHO.Grade==3 & p$recurrence==1]

And for All 并为All

p$ki67pro[p$recurrence==0] versus p$ki67pro[p$recurrence==1] p$ki67pro[p$recurrence==0]p$ki67pro[p$recurrence==1]

I have used the following script so far, but I can figure out on how to get the All included. 到目前为止,我已经使用了以下脚本,但我可以弄清楚如何获得All Please, note that there is only one case p$WHO.Grade==3 请注意,只有一个案例p$WHO.Grade==3

df <- data.frame(x = as.factor(c(p$WHO.Grade)),
                 y = c(p$ki67pro),
                 f = rep(c("ki67pro"), c(nrow(p))))

df <- df[!is.na(df$x),]
ggplot(df) +
  geom_boxplot(aes(x, y, fill = f, colour = f), outlier.alpha = 0, position = position_dodge(width = 0.78)) +
  scale_x_discrete(name = "", label=c("WHO-I","WHO-II","WHO-III","All")) +
  scale_y_continuous(name="x", breaks=seq(0,30,5), limits=c(0,30)) +
  stat_boxplot(aes(x, y, colour = f), geom = "errorbar", width = 0.3,position = position_dodge(0.7753)) +
  geom_point(aes(x, y, fill = f, colour = f), size = 3, shape = 21, position = position_jitterdodge()) +
  scale_fill_manual(values = c("#edf1f9", "#fcebeb"), name = "",
                    labels = c("", "")) +
  scale_colour_manual(values = c("#1C73C2", "red"), name = "",
                      labels = c("","")) + theme(legend.position="none")

My Data p

p <- structure(list(WHO.Grade = c(1L, 2L, 1L, 1L, 1L, 1L, 3L, 2L, 
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L), recurrence = c(0L, 0L, 0L, 0L, 0L, 
0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L), ki67pro = c(1L, 12L, 
3L, 3L, 5L, 3L, 20L, 25L, 7L, 4L, 5L, 12L, 3L, 15L, 4L, 5L, 7L, 
8L, 3L, 12L, 10L, 4L, 10L, 7L, 3L, 2L, 3L, 7L, 4L, 7L, 10L, 4L, 
5L, 5L, 3L, 5L, 2L, 5L, 3L, 3L, 3L, 4L, 4L, 3L, 2L, 5L, 1L, 5L, 
2L, 3L, 1L, 2L, 3L, 3L, 5L, 4L, 20L, 5L, 0L, 4L, 3L, 0L, 3L, 
4L, 1L, 2L, 20L, 2L, 3L, 5L, 4L, 8L, 1L, 4L, 5L, 4L, 3L, 6L, 
12L, 3L, 4L, 4L, 2L, 5L, 3L, 3L, 3L, 2L, 5L, 4L, 2L, 3L, 4L, 
3L, 3L, 2L, 2L, 4L, 7L, 4L, 3L, 4L, 2L, 3L, 6L, 2L, 3L, 10L, 
5L, 10L, 3L, 10L, 3L, 4L, 5L, 2L, 4L, 3L, 4L, 4L, 4L, 5L, 3L, 
12L, 5L, 4L, 3L, 2L, 4L, 3L, 4L, 2L, 1L, 6L, 1L, 4L, 12L, 3L, 
4L, 3L, 2L, 6L, 5L, 4L, 3L, 4L, 4L, 4L, 3L, 5L, 4L, 5L, 4L, 1L, 
3L, 3L, 4L, 0L, 3L)), class = "data.frame", row.names = c(1L, 
2L, 3L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 18L, 19L, 20L, 
21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 
34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 44L, 45L, 46L, 47L, 48L, 
49L, 50L, 51L, 52L, 53L, 54L, 55L, 57L, 59L, 60L, 61L, 62L, 63L, 
64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 
77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 87L, 89L, 90L, 91L, 
92L, 93L, 94L, 96L, 97L, 98L, 99L, 100L, 101L, 102L, 103L, 104L, 
105L, 106L, 107L, 109L, 110L, 111L, 112L, 113L, 114L, 115L, 116L, 
117L, 118L, 119L, 120L, 121L, 123L, 124L, 125L, 126L, 127L, 128L, 
130L, 131L, 132L, 133L, 134L, 135L, 136L, 137L, 138L, 139L, 140L, 
141L, 142L, 143L, 144L, 145L, 146L, 147L, 148L, 149L, 150L, 151L, 
152L, 153L, 154L, 155L, 156L, 157L, 158L, 159L, 160L, 161L, 162L, 
163L, 164L, 165L, 166L, 167L, 168L, 169L, 170L, 171L, 172L, 173L, 
174L, 175L))

What about something like this: 这样的事情怎么样:

# here you duplicate your original data
p1 <- p
# how to catch the all
p1$WHO.Grade <- 'all'
p <- rbind(p1,p)

library(ggplot2)
ggplot(p) +
geom_boxplot(aes(as.factor(WHO.Grade),
                  y = ki67pro,
                  fill = factor(recurrence) ,
                  color = factor(recurrence) ),
             outlier.alpha = 0 , position = position_dodge(width = 0.78)) +
# from here it's more or less your code
scale_x_discrete(name = "", label=c("WHO-I","WHO-II","WHO-III","All")) +
scale_y_continuous(name="x", breaks=seq(0,30,5), limits=c(0,30)) +
stat_boxplot(aes(as.factor(WHO.Grade),
                  y = ki67pro,
                  color = factor(recurrence) ),
              geom = "errorbar", width = 0.3,position = position_dodge(0.7753)) +
geom_point(aes(as.factor(WHO.Grade),
               y = ki67pro,
              color = factor(recurrence) ),
           size = 3, shape = 21, position = position_jitterdodge()) +
scale_fill_manual(values = c("#edf1f9", "#fcebeb"), name = "",
                    labels = c("", "")) +
scale_colour_manual(values = c("#1C73C2", "red"), name = "",
                      labels = c("","")) + 
theme(legend.position="none",
      panel.background = element_blank(),
      axis.line = element_line(colour = "black")) 

在此输入图像描述

A trick that can be used is to create a new level in WHO.Grade , since it only has 3 levels. 可以使用的一个技巧是在WHO.Grade创建一个新级别,因为它只有3个级别。 This should be a temporary level, so a good way of doing it is with package dplyr , function mutate . 这应该是一个临时级别,所以一个好方法是使用包dplyr ,函数mutate

Note that there is no need to create a new dataframe df . 请注意,无需创建新的数据帧df

library(ggplot2)
library(dplyr)

p %>%
  bind_rows(p %>% mutate(WHO.Grade = 4)) %>%
  mutate(WHO.Grade = factor(WHO.Grade),
         recurrence = factor(recurrence)) %>%
  ggplot(aes(WHO.Grade, ki67pro, 
             fill = recurrence, colour = recurrence)) +
  geom_boxplot(outlier.alpha = 0, 
               position = position_dodge(width = 0.78, preserve = "single")) +
  geom_point(size = 3, shape = 21, 
             position = position_jitterdodge()) +
  scale_x_discrete(name = "", 
                   label = c("WHO-I","WHO-II","WHO-III","All")) +
  scale_y_continuous(name = "x", breaks=seq(0,30,5), limits=c(0,30)) +
  scale_fill_manual(values = c("#edf1f9", "#fcebeb"), name = "",
                    labels = c("", "")) +
  scale_colour_manual(values = c("#1C73C2", "red"), name = "",
                      labels = c("","")) + 
  theme(legend.position="none")

在此输入图像描述

In case your dataset is too large for just doubling it in size you create two plots and put them next to each other via grid.arrange() . 如果您的数据集太大而无法将其大小加倍,您可以创建两个图并通过grid.arrange()将它们放在一起。

library(ggplot2)
library(gridExtra)


#the data
df <- data.frame(x = as.factor(c(p$WHO.Grade)),
                 y = p$ki67pro,
                 f = as.factor(p$recurrence))

df <- df[!is.na(df$x),]


# plot 1  

plot1 <- ggplot(df) +
  geom_boxplot(aes(x, y, fill = f, colour = f), outlier.alpha = 0, position = position_dodge(width = 0.78)) +
  scale_x_discrete(name = "", label=c("WHO-I","WHO-II","WHO-III","All")) +
  scale_y_continuous(name="x", breaks=seq(0,30,5), limits=c(0,30)) +
  stat_boxplot(aes(x, y, colour = f), geom = "errorbar", width = 0.3,position = position_dodge(0.7753)) +
  geom_point(aes(x, y, fill = f, colour = f), size = 3, shape = 21, position = position_jitterdodge()) +
  scale_fill_manual(values = c("#edf1f9", "#fcebeb"), name = "",
                    labels = c("", "")) +
  scale_colour_manual(values = c("#1C73C2", "red"), name = "",
                      labels = c("","")) + theme(legend.position="none") +
  theme(plot.margin = unit(c(1,-0.5,1, 1), "cm"))


#plot 2

plot2 <- ggplot(df) +
  geom_boxplot(aes(x = "All", y = y, fill = f, colour = f), outlier.alpha = 0, position = position_dodge(width = 0.78)) +
  scale_x_discrete(name = "") +
  scale_y_continuous(name="x", breaks=seq(0,30,5), limits=c(0,30)) +
  stat_boxplot(aes(x = "All", y = y, colour = f), geom = "errorbar", width = 0.3,position = position_dodge(0.7753)) +
  geom_point(aes(x = "All", y = y, fill = f, colour = f), size = 3, shape = 21, position = position_jitterdodge()) +
  scale_fill_manual(values = c("#edf1f9", "#fcebeb"), name = "",
                    labels = c("", "")) +
  scale_colour_manual(values = c("#1C73C2", "red"), name = "",
                      labels = c("","")) + theme(legend.position="none") +
  theme(axis.line.y = element_blank(),
        axis.title.y = element_blank(),
        axis.ticks.y = element_blank(),
        axis.text.y = element_blank(),
        plot.margin = unit(c(1,1,1, -0.5), "cm"))

#put it together

lm <- rbind(c(1,1,1,2))

grid.arrange(plot1, plot2, layout_matrix = lm)

在此输入图像描述

If I understood correctly, you just want to show all of your data in the last boxplot. 如果我理解正确,您只想在最后一个箱图中显示所有数据。 You can do this easily by just duplicating the data while creating the data frame and labelling the duplicate with All . 只需在创建数据框时复制数据并使用All标记副本,即可轻松完成此操作。

df <- data.frame(x = as.factor(c(p$WHO.Grade, rep("All", nrow(p)))),
                 y = rep(c(p$ki67pro), 2),
                 f = "ki67pro")

The plotting remains the same and you can easily add recurrence . 绘图保持不变,您可以轻松添加recurrence 例 However, the plot you're showing above looks weird as the All boxplot doesn't contain all the data. 但是,您在上面显示的图表看起来很奇怪,因为All boxplot不包含所有数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM