简体   繁体   中英

How to add different boxplots to the same plot based on different data sources in ggplot /R?

Please find My Data below. Please note that picture below is an example of the design I wish to copy and does not correlate to My Data specifically.

My Data is stored in p . I have a continuous covariate p$ki67pro which denominate the percentage of cells actively dividing in a tumor sample (thus, ranging from 0 to 100). I have three different stages of the tumor, which correspond to p$WHO.Grade==1,2,3 . Each sample represent a tumor patient that either had recurrence ( p$recurrence==1 ) or not ( p$recurrence==0 ).

Therefore:

head(p)
   WHO.Grade recurrence ki67pro
1          1          0       1
2          2          0      12
3          1          0       3
9          1          0       3
10         1          0       5
11         1          0       3

I wish to produce the boxplot below. As you can see, there are four points which correspond to each p$WHO.Grade and and All samples . There are two boxplots per p$WHO.Grade + All .

在此输入图像描述

Per p$WHO.Grade and All , I want one boxplot to represent p$ki67pro for recurrent tumors ( p$recurrence==1 ) and the other boxplot to represent p$ki67pro for non-recurrent tumors ( p$recurrence==0 ).

Ie

p$ki67pro[p$WHO.Grade==1 & p$recurrence==0] versus p$ki67pro[p$WHO.Grade==1 & p$recurrence==1]

p$ki67pro[p$WHO.Grade==2 & p$recurrence==0] versus p$ki67pro[p$WHO.Grade==2 & p$recurrence==1]

p$ki67pro[p$WHO.Grade==3 & p$recurrence==0] versus p$ki67pro[p$WHO.Grade==3 & p$recurrence==1]

And for All

p$ki67pro[p$recurrence==0] versus p$ki67pro[p$recurrence==1]

I have used the following script so far, but I can figure out on how to get the All included. Please, note that there is only one case p$WHO.Grade==3

df <- data.frame(x = as.factor(c(p$WHO.Grade)),
                 y = c(p$ki67pro),
                 f = rep(c("ki67pro"), c(nrow(p))))

df <- df[!is.na(df$x),]
ggplot(df) +
  geom_boxplot(aes(x, y, fill = f, colour = f), outlier.alpha = 0, position = position_dodge(width = 0.78)) +
  scale_x_discrete(name = "", label=c("WHO-I","WHO-II","WHO-III","All")) +
  scale_y_continuous(name="x", breaks=seq(0,30,5), limits=c(0,30)) +
  stat_boxplot(aes(x, y, colour = f), geom = "errorbar", width = 0.3,position = position_dodge(0.7753)) +
  geom_point(aes(x, y, fill = f, colour = f), size = 3, shape = 21, position = position_jitterdodge()) +
  scale_fill_manual(values = c("#edf1f9", "#fcebeb"), name = "",
                    labels = c("", "")) +
  scale_colour_manual(values = c("#1C73C2", "red"), name = "",
                      labels = c("","")) + theme(legend.position="none")

My Data p

p <- structure(list(WHO.Grade = c(1L, 2L, 1L, 1L, 1L, 1L, 3L, 2L, 
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L), recurrence = c(0L, 0L, 0L, 0L, 0L, 
0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L), ki67pro = c(1L, 12L, 
3L, 3L, 5L, 3L, 20L, 25L, 7L, 4L, 5L, 12L, 3L, 15L, 4L, 5L, 7L, 
8L, 3L, 12L, 10L, 4L, 10L, 7L, 3L, 2L, 3L, 7L, 4L, 7L, 10L, 4L, 
5L, 5L, 3L, 5L, 2L, 5L, 3L, 3L, 3L, 4L, 4L, 3L, 2L, 5L, 1L, 5L, 
2L, 3L, 1L, 2L, 3L, 3L, 5L, 4L, 20L, 5L, 0L, 4L, 3L, 0L, 3L, 
4L, 1L, 2L, 20L, 2L, 3L, 5L, 4L, 8L, 1L, 4L, 5L, 4L, 3L, 6L, 
12L, 3L, 4L, 4L, 2L, 5L, 3L, 3L, 3L, 2L, 5L, 4L, 2L, 3L, 4L, 
3L, 3L, 2L, 2L, 4L, 7L, 4L, 3L, 4L, 2L, 3L, 6L, 2L, 3L, 10L, 
5L, 10L, 3L, 10L, 3L, 4L, 5L, 2L, 4L, 3L, 4L, 4L, 4L, 5L, 3L, 
12L, 5L, 4L, 3L, 2L, 4L, 3L, 4L, 2L, 1L, 6L, 1L, 4L, 12L, 3L, 
4L, 3L, 2L, 6L, 5L, 4L, 3L, 4L, 4L, 4L, 3L, 5L, 4L, 5L, 4L, 1L, 
3L, 3L, 4L, 0L, 3L)), class = "data.frame", row.names = c(1L, 
2L, 3L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 18L, 19L, 20L, 
21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 
34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 44L, 45L, 46L, 47L, 48L, 
49L, 50L, 51L, 52L, 53L, 54L, 55L, 57L, 59L, 60L, 61L, 62L, 63L, 
64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 
77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 87L, 89L, 90L, 91L, 
92L, 93L, 94L, 96L, 97L, 98L, 99L, 100L, 101L, 102L, 103L, 104L, 
105L, 106L, 107L, 109L, 110L, 111L, 112L, 113L, 114L, 115L, 116L, 
117L, 118L, 119L, 120L, 121L, 123L, 124L, 125L, 126L, 127L, 128L, 
130L, 131L, 132L, 133L, 134L, 135L, 136L, 137L, 138L, 139L, 140L, 
141L, 142L, 143L, 144L, 145L, 146L, 147L, 148L, 149L, 150L, 151L, 
152L, 153L, 154L, 155L, 156L, 157L, 158L, 159L, 160L, 161L, 162L, 
163L, 164L, 165L, 166L, 167L, 168L, 169L, 170L, 171L, 172L, 173L, 
174L, 175L))

What about something like this:

# here you duplicate your original data
p1 <- p
# how to catch the all
p1$WHO.Grade <- 'all'
p <- rbind(p1,p)

library(ggplot2)
ggplot(p) +
geom_boxplot(aes(as.factor(WHO.Grade),
                  y = ki67pro,
                  fill = factor(recurrence) ,
                  color = factor(recurrence) ),
             outlier.alpha = 0 , position = position_dodge(width = 0.78)) +
# from here it's more or less your code
scale_x_discrete(name = "", label=c("WHO-I","WHO-II","WHO-III","All")) +
scale_y_continuous(name="x", breaks=seq(0,30,5), limits=c(0,30)) +
stat_boxplot(aes(as.factor(WHO.Grade),
                  y = ki67pro,
                  color = factor(recurrence) ),
              geom = "errorbar", width = 0.3,position = position_dodge(0.7753)) +
geom_point(aes(as.factor(WHO.Grade),
               y = ki67pro,
              color = factor(recurrence) ),
           size = 3, shape = 21, position = position_jitterdodge()) +
scale_fill_manual(values = c("#edf1f9", "#fcebeb"), name = "",
                    labels = c("", "")) +
scale_colour_manual(values = c("#1C73C2", "red"), name = "",
                      labels = c("","")) + 
theme(legend.position="none",
      panel.background = element_blank(),
      axis.line = element_line(colour = "black")) 

在此输入图像描述

A trick that can be used is to create a new level in WHO.Grade , since it only has 3 levels. This should be a temporary level, so a good way of doing it is with package dplyr , function mutate .

Note that there is no need to create a new dataframe df .

library(ggplot2)
library(dplyr)

p %>%
  bind_rows(p %>% mutate(WHO.Grade = 4)) %>%
  mutate(WHO.Grade = factor(WHO.Grade),
         recurrence = factor(recurrence)) %>%
  ggplot(aes(WHO.Grade, ki67pro, 
             fill = recurrence, colour = recurrence)) +
  geom_boxplot(outlier.alpha = 0, 
               position = position_dodge(width = 0.78, preserve = "single")) +
  geom_point(size = 3, shape = 21, 
             position = position_jitterdodge()) +
  scale_x_discrete(name = "", 
                   label = c("WHO-I","WHO-II","WHO-III","All")) +
  scale_y_continuous(name = "x", breaks=seq(0,30,5), limits=c(0,30)) +
  scale_fill_manual(values = c("#edf1f9", "#fcebeb"), name = "",
                    labels = c("", "")) +
  scale_colour_manual(values = c("#1C73C2", "red"), name = "",
                      labels = c("","")) + 
  theme(legend.position="none")

在此输入图像描述

In case your dataset is too large for just doubling it in size you create two plots and put them next to each other via grid.arrange() .

library(ggplot2)
library(gridExtra)


#the data
df <- data.frame(x = as.factor(c(p$WHO.Grade)),
                 y = p$ki67pro,
                 f = as.factor(p$recurrence))

df <- df[!is.na(df$x),]


# plot 1  

plot1 <- ggplot(df) +
  geom_boxplot(aes(x, y, fill = f, colour = f), outlier.alpha = 0, position = position_dodge(width = 0.78)) +
  scale_x_discrete(name = "", label=c("WHO-I","WHO-II","WHO-III","All")) +
  scale_y_continuous(name="x", breaks=seq(0,30,5), limits=c(0,30)) +
  stat_boxplot(aes(x, y, colour = f), geom = "errorbar", width = 0.3,position = position_dodge(0.7753)) +
  geom_point(aes(x, y, fill = f, colour = f), size = 3, shape = 21, position = position_jitterdodge()) +
  scale_fill_manual(values = c("#edf1f9", "#fcebeb"), name = "",
                    labels = c("", "")) +
  scale_colour_manual(values = c("#1C73C2", "red"), name = "",
                      labels = c("","")) + theme(legend.position="none") +
  theme(plot.margin = unit(c(1,-0.5,1, 1), "cm"))


#plot 2

plot2 <- ggplot(df) +
  geom_boxplot(aes(x = "All", y = y, fill = f, colour = f), outlier.alpha = 0, position = position_dodge(width = 0.78)) +
  scale_x_discrete(name = "") +
  scale_y_continuous(name="x", breaks=seq(0,30,5), limits=c(0,30)) +
  stat_boxplot(aes(x = "All", y = y, colour = f), geom = "errorbar", width = 0.3,position = position_dodge(0.7753)) +
  geom_point(aes(x = "All", y = y, fill = f, colour = f), size = 3, shape = 21, position = position_jitterdodge()) +
  scale_fill_manual(values = c("#edf1f9", "#fcebeb"), name = "",
                    labels = c("", "")) +
  scale_colour_manual(values = c("#1C73C2", "red"), name = "",
                      labels = c("","")) + theme(legend.position="none") +
  theme(axis.line.y = element_blank(),
        axis.title.y = element_blank(),
        axis.ticks.y = element_blank(),
        axis.text.y = element_blank(),
        plot.margin = unit(c(1,1,1, -0.5), "cm"))

#put it together

lm <- rbind(c(1,1,1,2))

grid.arrange(plot1, plot2, layout_matrix = lm)

在此输入图像描述

If I understood correctly, you just want to show all of your data in the last boxplot. You can do this easily by just duplicating the data while creating the data frame and labelling the duplicate with All .

df <- data.frame(x = as.factor(c(p$WHO.Grade, rep("All", nrow(p)))),
                 y = rep(c(p$ki67pro), 2),
                 f = "ki67pro")

The plotting remains the same and you can easily add recurrence . 例 However, the plot you're showing above looks weird as the All boxplot doesn't contain all the data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM