[英]How to add different boxplots to the same plot based on different data sources in ggplot /R?
請在下面找到My Data
。 請注意,下圖是我希望復制的設計示例,並且與My Data
無關。
My Data
存儲在p
。 我有一個連續的協變量p$ki67pro
,它命名在腫瘤樣本中活躍分裂的細胞百分比(因此,范圍從0到100)。 我有三個不同的腫瘤階段,對應於p$WHO.Grade==1,2,3
。 每個樣本代表腫瘤患者復發( p$recurrence==1
)或不p$recurrence==0
( p$recurrence==0
)。
因此:
head(p)
WHO.Grade recurrence ki67pro
1 1 0 1
2 2 0 12
3 1 0 3
9 1 0 3
10 1 0 5
11 1 0 3
我想制作下面的箱線圖。 如您所見,有四個點對應於每個p$WHO.Grade
和All samples
。 每p$WHO.Grade
+ All
有兩個p$WHO.Grade
。
按照p$WHO.Grade
和All
,我想要一個箱圖代表p$ki67pro
用於復發性腫瘤( p$recurrence==1
),另一個箱圖代表p$ki67pro
用於非復發性腫瘤( p$recurrence==0
)。
即
p$ki67pro[p$WHO.Grade==1 & p$recurrence==0]
與p$ki67pro[p$WHO.Grade==1 & p$recurrence==1]
p$ki67pro[p$WHO.Grade==2 & p$recurrence==0]
與p$ki67pro[p$WHO.Grade==2 & p$recurrence==1]
p$ki67pro[p$WHO.Grade==3 & p$recurrence==0]
與p$ki67pro[p$WHO.Grade==3 & p$recurrence==1]
並為All
p$ki67pro[p$recurrence==0]
與p$ki67pro[p$recurrence==1]
到目前為止,我已經使用了以下腳本,但我可以弄清楚如何獲得All
。 請注意,只有一個案例p$WHO.Grade==3
df <- data.frame(x = as.factor(c(p$WHO.Grade)),
y = c(p$ki67pro),
f = rep(c("ki67pro"), c(nrow(p))))
df <- df[!is.na(df$x),]
ggplot(df) +
geom_boxplot(aes(x, y, fill = f, colour = f), outlier.alpha = 0, position = position_dodge(width = 0.78)) +
scale_x_discrete(name = "", label=c("WHO-I","WHO-II","WHO-III","All")) +
scale_y_continuous(name="x", breaks=seq(0,30,5), limits=c(0,30)) +
stat_boxplot(aes(x, y, colour = f), geom = "errorbar", width = 0.3,position = position_dodge(0.7753)) +
geom_point(aes(x, y, fill = f, colour = f), size = 3, shape = 21, position = position_jitterdodge()) +
scale_fill_manual(values = c("#edf1f9", "#fcebeb"), name = "",
labels = c("", "")) +
scale_colour_manual(values = c("#1C73C2", "red"), name = "",
labels = c("","")) + theme(legend.position="none")
My Data p
p <- structure(list(WHO.Grade = c(1L, 2L, 1L, 1L, 1L, 1L, 3L, 2L,
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), recurrence = c(0L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L), ki67pro = c(1L, 12L,
3L, 3L, 5L, 3L, 20L, 25L, 7L, 4L, 5L, 12L, 3L, 15L, 4L, 5L, 7L,
8L, 3L, 12L, 10L, 4L, 10L, 7L, 3L, 2L, 3L, 7L, 4L, 7L, 10L, 4L,
5L, 5L, 3L, 5L, 2L, 5L, 3L, 3L, 3L, 4L, 4L, 3L, 2L, 5L, 1L, 5L,
2L, 3L, 1L, 2L, 3L, 3L, 5L, 4L, 20L, 5L, 0L, 4L, 3L, 0L, 3L,
4L, 1L, 2L, 20L, 2L, 3L, 5L, 4L, 8L, 1L, 4L, 5L, 4L, 3L, 6L,
12L, 3L, 4L, 4L, 2L, 5L, 3L, 3L, 3L, 2L, 5L, 4L, 2L, 3L, 4L,
3L, 3L, 2L, 2L, 4L, 7L, 4L, 3L, 4L, 2L, 3L, 6L, 2L, 3L, 10L,
5L, 10L, 3L, 10L, 3L, 4L, 5L, 2L, 4L, 3L, 4L, 4L, 4L, 5L, 3L,
12L, 5L, 4L, 3L, 2L, 4L, 3L, 4L, 2L, 1L, 6L, 1L, 4L, 12L, 3L,
4L, 3L, 2L, 6L, 5L, 4L, 3L, 4L, 4L, 4L, 3L, 5L, 4L, 5L, 4L, 1L,
3L, 3L, 4L, 0L, 3L)), class = "data.frame", row.names = c(1L,
2L, 3L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 18L, 19L, 20L,
21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L,
34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 44L, 45L, 46L, 47L, 48L,
49L, 50L, 51L, 52L, 53L, 54L, 55L, 57L, 59L, 60L, 61L, 62L, 63L,
64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L,
77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 87L, 89L, 90L, 91L,
92L, 93L, 94L, 96L, 97L, 98L, 99L, 100L, 101L, 102L, 103L, 104L,
105L, 106L, 107L, 109L, 110L, 111L, 112L, 113L, 114L, 115L, 116L,
117L, 118L, 119L, 120L, 121L, 123L, 124L, 125L, 126L, 127L, 128L,
130L, 131L, 132L, 133L, 134L, 135L, 136L, 137L, 138L, 139L, 140L,
141L, 142L, 143L, 144L, 145L, 146L, 147L, 148L, 149L, 150L, 151L,
152L, 153L, 154L, 155L, 156L, 157L, 158L, 159L, 160L, 161L, 162L,
163L, 164L, 165L, 166L, 167L, 168L, 169L, 170L, 171L, 172L, 173L,
174L, 175L))
這樣的事情怎么樣:
# here you duplicate your original data
p1 <- p
# how to catch the all
p1$WHO.Grade <- 'all'
p <- rbind(p1,p)
library(ggplot2)
ggplot(p) +
geom_boxplot(aes(as.factor(WHO.Grade),
y = ki67pro,
fill = factor(recurrence) ,
color = factor(recurrence) ),
outlier.alpha = 0 , position = position_dodge(width = 0.78)) +
# from here it's more or less your code
scale_x_discrete(name = "", label=c("WHO-I","WHO-II","WHO-III","All")) +
scale_y_continuous(name="x", breaks=seq(0,30,5), limits=c(0,30)) +
stat_boxplot(aes(as.factor(WHO.Grade),
y = ki67pro,
color = factor(recurrence) ),
geom = "errorbar", width = 0.3,position = position_dodge(0.7753)) +
geom_point(aes(as.factor(WHO.Grade),
y = ki67pro,
color = factor(recurrence) ),
size = 3, shape = 21, position = position_jitterdodge()) +
scale_fill_manual(values = c("#edf1f9", "#fcebeb"), name = "",
labels = c("", "")) +
scale_colour_manual(values = c("#1C73C2", "red"), name = "",
labels = c("","")) +
theme(legend.position="none",
panel.background = element_blank(),
axis.line = element_line(colour = "black"))
可以使用的一個技巧是在WHO.Grade
創建一個新級別,因為它只有3個級別。 這應該是一個臨時級別,所以一個好方法是使用包dplyr
,函數mutate
。
請注意,無需創建新的數據幀df
。
library(ggplot2)
library(dplyr)
p %>%
bind_rows(p %>% mutate(WHO.Grade = 4)) %>%
mutate(WHO.Grade = factor(WHO.Grade),
recurrence = factor(recurrence)) %>%
ggplot(aes(WHO.Grade, ki67pro,
fill = recurrence, colour = recurrence)) +
geom_boxplot(outlier.alpha = 0,
position = position_dodge(width = 0.78, preserve = "single")) +
geom_point(size = 3, shape = 21,
position = position_jitterdodge()) +
scale_x_discrete(name = "",
label = c("WHO-I","WHO-II","WHO-III","All")) +
scale_y_continuous(name = "x", breaks=seq(0,30,5), limits=c(0,30)) +
scale_fill_manual(values = c("#edf1f9", "#fcebeb"), name = "",
labels = c("", "")) +
scale_colour_manual(values = c("#1C73C2", "red"), name = "",
labels = c("","")) +
theme(legend.position="none")
如果您的數據集太大而無法將其大小加倍,您可以創建兩個圖並通過grid.arrange()
將它們放在一起。
library(ggplot2)
library(gridExtra)
#the data
df <- data.frame(x = as.factor(c(p$WHO.Grade)),
y = p$ki67pro,
f = as.factor(p$recurrence))
df <- df[!is.na(df$x),]
# plot 1
plot1 <- ggplot(df) +
geom_boxplot(aes(x, y, fill = f, colour = f), outlier.alpha = 0, position = position_dodge(width = 0.78)) +
scale_x_discrete(name = "", label=c("WHO-I","WHO-II","WHO-III","All")) +
scale_y_continuous(name="x", breaks=seq(0,30,5), limits=c(0,30)) +
stat_boxplot(aes(x, y, colour = f), geom = "errorbar", width = 0.3,position = position_dodge(0.7753)) +
geom_point(aes(x, y, fill = f, colour = f), size = 3, shape = 21, position = position_jitterdodge()) +
scale_fill_manual(values = c("#edf1f9", "#fcebeb"), name = "",
labels = c("", "")) +
scale_colour_manual(values = c("#1C73C2", "red"), name = "",
labels = c("","")) + theme(legend.position="none") +
theme(plot.margin = unit(c(1,-0.5,1, 1), "cm"))
#plot 2
plot2 <- ggplot(df) +
geom_boxplot(aes(x = "All", y = y, fill = f, colour = f), outlier.alpha = 0, position = position_dodge(width = 0.78)) +
scale_x_discrete(name = "") +
scale_y_continuous(name="x", breaks=seq(0,30,5), limits=c(0,30)) +
stat_boxplot(aes(x = "All", y = y, colour = f), geom = "errorbar", width = 0.3,position = position_dodge(0.7753)) +
geom_point(aes(x = "All", y = y, fill = f, colour = f), size = 3, shape = 21, position = position_jitterdodge()) +
scale_fill_manual(values = c("#edf1f9", "#fcebeb"), name = "",
labels = c("", "")) +
scale_colour_manual(values = c("#1C73C2", "red"), name = "",
labels = c("","")) + theme(legend.position="none") +
theme(axis.line.y = element_blank(),
axis.title.y = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_blank(),
plot.margin = unit(c(1,1,1, -0.5), "cm"))
#put it together
lm <- rbind(c(1,1,1,2))
grid.arrange(plot1, plot2, layout_matrix = lm)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.