简体   繁体   中英

Adding several box plots in one

I have a dataset where I have three different groups of individuals, let´s call them Green, Red, and Blue. Then I have data covering 92 proteins in their blood, from which I have readings for each individual in each group.

I would like to get a good overview of the variances and means for each protein for each group. Which means that I would like to make a multiple box plot graph.

I would like to have the different proteins on the x-axis, and three box plots (preferably in different colors) (one for each group) above every protein, with numeric protein weight on the y-axis.

How do I do this?

I am currently working with a data frame where the groups are divided by the rows, and the different protein readings is in each column.

Tried to add a picture, but apparently you need reputation-points…

I´ve heard that you can use the melt command in reshape2, but I need guidance in how to use it.

Please, simplify the answers. I´m not very experienced when it comes to R.

Look, I realize things are frustrating when you are first getting started, but you're going to have to ask specific and targeted questions for people to be willing and able to help you out in a structured way.

Having said that, let's walk through a structured example. I am only going to use 9 proteins here, but you should get the idea.

library(ggplot2)
library(reshape2)

# Setup a data frame, since the question did not provide one...
df <- structure(list(Individual = 1:12, 
                     Group = structure(c(2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L), 
                              .Label = c("Blue", "Green", "Red"), class = "factor"), 
                     Protein_1 = c(82L, 23L, 19L, 100L, 33L, 86L, 32L, 41L, 39L, 59L, 93L, 99L), 
                     Protein_2 = c(86L, 50L, 86L, 90L, 37L, 20L, 26L, 38L, 87L, 81L, 23L, 49L), 
                     Protein_3 = c(81L, 31L, 5L, 10L, 79L, 40L, 27L, 73L, 64L, 30L, 87L, 64L), 
                     Protein_4 = c(52L, 15L, 25L, 12L, 63L, 52L, 60L, 33L, 27L, 32L, 53L, 93L), 
                     Protein_5 = c(19L, 75L, 25L, 14L, 33L, 60L, 73L, 13L, 92L, 92L, 91L, 12L), 
                     Protein_6 = c(33L, 49L, 29L, 58L, 51L, 12L, 61L, 48L, 71L, 18L, 84L, 31L), 
                     Protein_7 = c(84L, 57L, 28L, 99L, 47L, 54L, 72L, 97L, 73L, 46L, 68L, 37L), 
                     Protein_8 = c(15L, 16L, 46L, 95L, 57L, 86L, 30L, 83L, 45L, 12L, 49L, 82L), 
                     Protein_9 = c(84L, 91L, 33L, 10L, 91L, 91L, 4L, 88L, 42L, 82L, 76L, 95L)), 
                .Names = c("Individual", "Group", "Protein_1", "Protein_2", "Protein_3", 
                           "Protein_4", "Protein_5", "Protein_6", "Protein_7", "Protein_8", "Protein_9"), 
                class = "data.frame", row.names = c(NA, -12L))

head(df)
# Individual Group Protein_1 Protein_2 Protein_3 Protein_4 Protein_5 Protein_6 Protein_7 Protein_8 Protein_9
# 1          1 Green        82        86        81        52        19        33        84        15        84
# 2          2  Blue        23        50        31        15        75        49        57        16        91
# 3          3   Red        19        86         5        25        25        29        28        46        33
# 4          4 Green       100        90        10        12        14        58        99        95        10
# 5          5  Blue        33        37        79        63        33        51        47        57        91
# 6          6   Red        86        20        40        52        60        12        54        86        91
?melt
df.melted <- melt(df, id.vars = c("Individual", "Group"))
head(df.melted)
# Individual Group  variable value
# 1          1 Green Protein_1    82
# 2          2  Blue Protein_1    23
# 3          3   Red Protein_1    19
# 4          4 Green Protein_1   100
# 5          5  Blue Protein_1    33
# 6          6   Red Protein_1    86

# First Protein
# Notice I am using subset()
ggplot(data = subset(df.melted, variable == "Protein_1"),
       aes(x = Group, y = value)) + geom_boxplot(aes(fill = Group))

蛋白质1

# Second Protein
ggplot(data = subset(df.melted, variable == "Protein_2"),
       aes(x = Group, y = value)) + geom_boxplot(aes(fill = Group))

蛋白质2

# and so on...

# You could also use facets
ggplot(data = df.melted, aes(x = Group, y = value)) + 
  geom_boxplot(aes(fill = Group)) +
  facet_wrap(~ variable)

面

And yes, I realize that the color groupings do not align with the colors of the plot...I will leave that as an exercise... You have to be willing to tinker, explore, and fail many times.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM