简体   繁体   中英

Grouped Bar Chart of Means in R

I have a data set (learner) with student test scores (learner$literacy_total), their grade level (ie. grade 1, 2, 3, ..., 12), and their gender (learner$gender). I'd like to create a bar plot that has grade on the x axis, and the average score on the y axis, with two columns for each grade (one for males and one for females) so I can see how boys/girls do in each grade. I can easily create a plot of the overall average for each grade using the following code:

fig.dist <- split(learner$literacy_total, learner$learner_grade)
fig.mean <- sapply(fig.dist, mean, na.rm = TRUE)
barplot(fig.mean)

But how do I group these so that for each grade I can see the average test scores for boys/girls separately.

In other questions I've seen code that either groups categories or graphs the means, but I'm struggling with how to put the two together.

To extend @detroyejr's answer, consider tapply which slices a vector by various factor(s) and applies a function such as mean to each subset returning a named vector or matrix.

However, to align to your original overall mean barplot, transpose the tapply result with t() for male/female rownames and 1-12 grades as colnames . Then use beside=TRUE for unstacked bars.

gender.mean <- t(tapply(learner$literacy_total,
                        list(learner$learner_grade, learner$gender), mean))

barplot(gender.mean, col=c("darkblue","red"), beside=TRUE, legend=rownames(gender.mean))

To demonstrate with random data:

set.seed(888)
learner <- data.frame(
  learner_grade = replicate(50, sample(seq(12), 1, replace=TRUE)),
  gender = replicate(50, sample(c("MALE", "FEMALE"), 1, replace=TRUE)),
  literacy_total = abs(rnorm(50)*100)
)

gender.mean <- t(tapply(learner$literacy_total, 
                        list(learner$learner_grade, learner$gender), mean))

barplot(gender.mean, col=c("darkblue","red"), beside=TRUE, legend=rownames(gender.mean))

条形图输出

You can use tapply (see here or help(tapply) for more info). So, something like this using your dataset:

tapply(df[["literacy_total"]], list(df[["learner_grade"]], df[["gender"]]), mean)

In this example, tapply essentially breaks literacy_total into each combination of learner_grade and gender available and computes the mean value at each grouping. You can see another example using:

tapply(mtcars$mpg, list(mtcars$cyl, mtcars$am), mean)

It's easier to answer if you provide a reproducible example, but this might get you started.

a solution using ggplot and dplyr

library(ggplot2)
library(dplyr)
# example data (make sure 'sex' and 'grade' is stored as a factor)
df <- data.frame(literacy_total = rnorm(300)^2, 
             grade = as.factor(rep(1:10, 30)),
             sex = as.factor(sample(1:2, 300, replace = TRUE)))

# calculate the means of each combination of 'grade' and 'sex' with `group_by`
means <- df %>% group_by(grade, sex) %>% 
   summarise(mean = mean(literacy_total))

# making the plot
ggplot(means, aes(x = grade, y = mean, fill = sex)) +
    geom_bar(stat = "identity", position = "dodge")

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM