I have a dataset where I have three different groups of individuals, let´s call them Green, Red, and Blue. Then I have data covering 92 proteins in their blood, from which I have readings for each individual in each group.
I would like to get a good overview of the variances and means for each protein for each group. Which means that I would like to make a multiple box plot graph.
I would like to have the different proteins on the x-axis, and three box plots (preferably in different colors) (one for each group) above every protein, with numeric protein weight on the y-axis.
How do I do this?
I am currently working with a data frame where the groups are divided by the rows, and the different protein readings is in each column.
Tried to add a picture, but apparently you need reputation-points…
I´ve heard that you can use the melt command in reshape2, but I need guidance in how to use it.
Please, simplify the answers. I´m not very experienced when it comes to R.
Look, I realize things are frustrating when you are first getting started, but you're going to have to ask specific and targeted questions for people to be willing and able to help you out in a structured way.
Having said that, let's walk through a structured example. I am only going to use 9 proteins here, but you should get the idea.
library(ggplot2)
library(reshape2)
# Setup a data frame, since the question did not provide one...
df <- structure(list(Individual = 1:12,
Group = structure(c(2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L),
.Label = c("Blue", "Green", "Red"), class = "factor"),
Protein_1 = c(82L, 23L, 19L, 100L, 33L, 86L, 32L, 41L, 39L, 59L, 93L, 99L),
Protein_2 = c(86L, 50L, 86L, 90L, 37L, 20L, 26L, 38L, 87L, 81L, 23L, 49L),
Protein_3 = c(81L, 31L, 5L, 10L, 79L, 40L, 27L, 73L, 64L, 30L, 87L, 64L),
Protein_4 = c(52L, 15L, 25L, 12L, 63L, 52L, 60L, 33L, 27L, 32L, 53L, 93L),
Protein_5 = c(19L, 75L, 25L, 14L, 33L, 60L, 73L, 13L, 92L, 92L, 91L, 12L),
Protein_6 = c(33L, 49L, 29L, 58L, 51L, 12L, 61L, 48L, 71L, 18L, 84L, 31L),
Protein_7 = c(84L, 57L, 28L, 99L, 47L, 54L, 72L, 97L, 73L, 46L, 68L, 37L),
Protein_8 = c(15L, 16L, 46L, 95L, 57L, 86L, 30L, 83L, 45L, 12L, 49L, 82L),
Protein_9 = c(84L, 91L, 33L, 10L, 91L, 91L, 4L, 88L, 42L, 82L, 76L, 95L)),
.Names = c("Individual", "Group", "Protein_1", "Protein_2", "Protein_3",
"Protein_4", "Protein_5", "Protein_6", "Protein_7", "Protein_8", "Protein_9"),
class = "data.frame", row.names = c(NA, -12L))
head(df)
# Individual Group Protein_1 Protein_2 Protein_3 Protein_4 Protein_5 Protein_6 Protein_7 Protein_8 Protein_9
# 1 1 Green 82 86 81 52 19 33 84 15 84
# 2 2 Blue 23 50 31 15 75 49 57 16 91
# 3 3 Red 19 86 5 25 25 29 28 46 33
# 4 4 Green 100 90 10 12 14 58 99 95 10
# 5 5 Blue 33 37 79 63 33 51 47 57 91
# 6 6 Red 86 20 40 52 60 12 54 86 91
?melt
df.melted <- melt(df, id.vars = c("Individual", "Group"))
head(df.melted)
# Individual Group variable value
# 1 1 Green Protein_1 82
# 2 2 Blue Protein_1 23
# 3 3 Red Protein_1 19
# 4 4 Green Protein_1 100
# 5 5 Blue Protein_1 33
# 6 6 Red Protein_1 86
# First Protein
# Notice I am using subset()
ggplot(data = subset(df.melted, variable == "Protein_1"),
aes(x = Group, y = value)) + geom_boxplot(aes(fill = Group))
# Second Protein
ggplot(data = subset(df.melted, variable == "Protein_2"),
aes(x = Group, y = value)) + geom_boxplot(aes(fill = Group))
# and so on...
# You could also use facets
ggplot(data = df.melted, aes(x = Group, y = value)) +
geom_boxplot(aes(fill = Group)) +
facet_wrap(~ variable)
And yes, I realize that the color groupings do not align with the colors of the plot...I will leave that as an exercise... You have to be willing to tinker, explore, and fail many times.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.