I want to create a violin plot using ggplot2. The y axis should be number of occurrences. This can either be represented by N
in df2
or alternatively, the number of unique ID
mapping to each group
in df1
. The x axis should be Freq
in df1
. The densities should correspond to each groups (A, B, C) range in occurrences (df2$N).
#dummy data
df1 <- read.table(text=" ID Freq group
ind00001 1 A
ind00001 3 A
ind00001 12 B
ind00001 19 B
ind00001 33 C
ind00001 2 A
ind00003 1 A
ind00003 32 C
ind00003 20 B
ind00003 12 B
ind00003 4 B
ind00003 3 A
ind00003 4 B
ind00006 2 A
ind00006 11 B
ind00006 1 A
ind00006 34 C
ind00006 1 A
ind00006 5 B
ind00013 1 A
ind00013 5 B
ind00013 6 B
ind00013 11 B
ind00013 6 B
ind00013 10 B
ind00013 1 A
ind00015 2 A
ind00015 10 B
ind00015 33 C
ind00015 5 B
ind00022 1 A
ind00022 8 B
ind00022 26 B
ind00022 4 B
ind00048 2 A
ind00048 9 B
ind00048 30 B
ind00048 6 B
ind00068 2 A
ind00068 10 B
ind00084 1 A
ind00084 1 A
ind00084 4 B
ind00084 1 A
ind00084 2 A
ind00089 3 A
ind00089 30 B
ind00104 2 A
ind00104 2 A
ind00104 1 A
ind00104 6 B
ind00104 4 B
ind00104 4 B
ind00106 2 A
ind00106 1 A
ind00106 10 B
ind00106 3 A
ind00106 2 A
ind00118 2 A
ind00118 2 A
ind00118 6 B
ind00118 19 B
ind00118 3 A
ind00118 2 A
ind00123 3 A
ind00123 2 A
ind00123 1 A
ind00123 3 A
ind00123 4 B
ind00123 31 C
ind00130 1 A
ind00130 2 A
ind00130 1 A
ind00130 19 B
ind00130 3 A
ind00130 2 A
ind00138 3 A
ind00138 7 B
ind00138 1 A
ind00138 3 A
ind00138 5 B
ind00138 10 B
ind00138 25 B
ind00148 2 A
ind00148 3 A
ind00148 3 A
ind00148 3 A
ind00148 19 B
ind00149 3 A
ind00149 1 A
ind00149 5 B
ind00156 1 A
ind00156 2 A
ind00156 9 B
ind00156 2 A
ind00169 3 A
ind00169 3 A
ind00169 2 A
ind00169 4 B
ind00169 3 A", header=T)
df2 <- read.table(text="N group ID
3 A ind00001
2 B ind00001
1 C ind00001
1 A ind00002
2 B ind00002
1 C ind00002
2 A ind00003
4 B ind00003
1 C ind00003
3 B ind00004
1 C ind00004
1 B ind00005
1 C ind00005
3 A ind00006
2 B ind00006
1 C ind00006
1 A ind00007
1 B ind00007
1 C ind00007
2 A ind00008
3 B ind00008
1 C ind00008
1 A ind00009
3 B ind00009
1 A ind00010
2 B ind00010
1 C ind00010
1 A ind00011
1 B ind00011
1 C ind00011
1 A ind00012
4 B ind00012
1 C ind00012
2 A ind00013
5 B ind00013
1 A ind00014
2 B ind00014
1 C ind00014
1 A ind00015
2 B ind00015
1 C ind00015
3 B ind00016
1 C ind00016
3 B ind00017
1 C ind00017
2 A ind00018
2 B ind00018
2 B ind00019
1 C ind00019
2 A ind00020
1 B ind00020
1 A ind00021
2 B ind00021
1 C ind00021
1 A ind00022
3 B ind00022
2 A ind00023
3 B ind00023
1 C ind00023
2 B ind00024
1 C ind00024
6 B ind00025
1 C ind00025
1 A ind00026
2 B ind00026
1 C ind00026
1 A ind00027
1 B ind00027
1 C ind00027
1 A ind00028
2 B ind00028
1 C ind00028
1 A ind00029
1 B ind00029
1 C ind00029
1 A ind00030
3 B ind00030
1 C ind00030
6 B ind00031
1 C ind00031
2 A ind00032
1 B ind00032
1 A ind00033
4 B ind00033
3 B ind00034
1 C ind00034
2 A ind00035
1 B ind00035
1 A ind00036
1 B ind00036
1 A ind00037
3 B ind00037
1 C ind00037
1 A ind00038
4 B ind00038
1 A ind00039
3 B ind00039
1 A ind00040
2 B ind00040
2 B ind00041", header=T)
Tried this for plotting, but it (obviously) yields an uncorrect plot.
require(ggplot2)
require(qpcR)
ggplot(data.frame(qpcR:::cbind.na(x=df1$Freq, y=df2$N, group=df1$group)), aes(x=x, y=y, group=group, fill=group)) + geom_violin() + theme_bw()
The correct plot should have densities, by groups A, B, C, corresponding to their number of occurrences (df2$N).
Eg Group C (light blue or 3 in the plot) should not exceed value 1 on the y axis, as seen below.
Any pointer would be highly appreciated, thanks!
# C only have df$N == 1
subset(df2, group %in% "C")
N group ID
1 C ind00001
1 C ind00002
1 C ind00003
1 C ind00004
1 C ind00005
1 C ind00006
1 C ind00007
1 C ind00008
1 C ind00010
1 C ind00011
1 C ind00012
1 C ind00014
1 C ind00015
1 C ind00016
1 C ind00017
1 C ind00019
1 C ind00021
1 C ind00023
1 C ind00024
1 C ind00025
1 C ind00026
1 C ind00027
1 C ind00028
1 C ind00029
1 C ind00030
1 C ind00031
1 C ind00034
1 C ind00037
# B have df$N ranging from 1 to 6
subset(df2, group %in% "B")
N group ID
2 B ind00001
2 B ind00002
4 B ind00003
3 B ind00004
1 B ind00005
2 B ind00006
1 B ind00007
3 B ind00008
3 B ind00009
2 B ind00010
1 B ind00011
4 B ind00012
5 B ind00013
2 B ind00014
2 B ind00015
3 B ind00016
3 B ind00017
2 B ind00018
2 B ind00019
1 B ind00020
2 B ind00021
3 B ind00022
3 B ind00023
2 B ind00024
6 B ind00025
2 B ind00026
1 B ind00027
2 B ind00028
1 B ind00029
3 B ind00030
6 B ind00031
1 B ind00032
4 B ind00033
3 B ind00034
1 B ind00035
1 B ind00036
3 B ind00037
4 B ind00038
3 B ind00039
2 B ind00040
2 B ind00041
# A only have df$N ranging from 1 to 3
subset(df2, group %in% "A")
N group ID
3 A ind00001
1 A ind00002
2 A ind00003
3 A ind00006
1 A ind00007
2 A ind00008
1 A ind00009
1 A ind00010
1 A ind00011
1 A ind00012
2 A ind00013
1 A ind00014
1 A ind00015
2 A ind00018
2 A ind00020
1 A ind00021
1 A ind00022
2 A ind00023
1 A ind00026
1 A ind00027
1 A ind00028
1 A ind00029
1 A ind00030
2 A ind00032
1 A ind00033
2 A ind00035
1 A ind00036
1 A ind00037
1 A ind00038
1 A ind00039
1 A ind00040
plotData <- merge(df1,df2,by=c("ID","group"),all=F)
all members in group=C have the same N which leads geom_violin to fail - boxplot is an option:
ggplot(plotData, aes(x=as.factor(group), y=N, group=group, fill=group)) + geom_boxplot() + theme_bw()
Otherwise, remove group=C:
ggplot(plotData[plotData$group!="C",], aes(x=as.factor(group), y=N, group=group, fill=group)) + geom_violin() + theme_bw()
Thanks a lot for the great responses. I get a bit closer to my imagined output if I combine @CMichael 's two answers.
p1 <- ggplot(subset(plotData, group %in% c("A","B")), aes(x=as.factor(group), y=N, group=group, fill=group)) + geom_violin() + theme_bw()
p1 + geom_boxplot(data=subset(plotData, group %in% c("C")), aes(x=as.factor(group), y=N, group=group, fill=group))
UPDATE
Even closer to the original idea. Although, the violins are not centered over mass density, but instead in the centre of their x interval.
ggplot(plotData, aes(x=Freq, y=N)) + theme_bw() + scale_x_continuous(breaks=1:36) + geom_jitter(aes(colour=group)) +geom_violin(data=subset(plotData, group %in% c("A","B")), alpha = .0, trim=F, aes(group=group)) + scale_y_continuous(breaks=1:7)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.