[英]How to group data and then draw bar chart in ggplot2
我有3列的数据框(df),例如
NUMERIC1: NUMERIC2: GROUP(CHARACTER):
100 1 A
200 2 B
300 3 C
400 4 A
我想按GROUP(CHARACTER)对NUMERIC1进行分组,然后为每个组计算均值。 像这样:
mean(NUMERIC1): GROUP(CHARACTER):
250 A
200 B
300 C
最后,我想使用在x轴上具有GROUP(CHARACTER)且在y轴上具有nd mean(NUMERIC)的ggplot2绘制条形图。 它应该看起来像:
我用了
mean <- tapply(df$NUMERIC1, df$GROUP(CHARACTER), FUN=mean)
但我不确定是否可以,即使可以,我也不知道下一步该怎么做。
这是stat_summmary(...)
设计的目的:
colnames(df) <- c("N1","N2","GROUP")
library(ggplot2)
ggplot(df) + stat_summary(aes(x=GROUP,y=N1),fun.y=mean,geom="bar",
fill="lightblue",col="grey50")
尝试类似:
res <- aggregate(NUMERIC1 ~ GROUP, data = df, FUN = mean)
ggplot(res, aes(x = GROUP, y = NUMERIC1)) + geom_bar(stat = "identity")
df <- structure(list(NUMERIC1 = c(100L, 200L, 300L, 400L), NUMERIC2 = 1:4,
GROUP = structure(c(1L, 2L, 3L, 1L), .Label = c("A", "B",
"C"), class = "factor")), .Names = c("NUMERIC1", "NUMERIC2",
"GROUP"), class = "data.frame", row.names = c(NA, -4L))
我建议类似的东西:
#Imports; data.table, which allows for really convenient "apply a function to
#"each part of a df, by unique value", and ggplot2
library(data.table)
library(ggplot2)
#Convert df to a data.table. It remains a data.frame, so any function that works
#on a data.frame can still work here.
data <- as.data.table(df)
#By each unique value in "CHARACTER", subset and calculate the mean of the
#NUMERIC1 values within that subset. You end up with a data.frame/data.table
#with the columns CHARACTER and mean_value
data <- data[, j = list(mean_value = mean(NUMERIC1)), by = "CHARACTER"]
#And now we play the plotting game (the plotting game is boring, lets
#play Hungry Hungry Hippos!)
plot <- ggplot(data, aes(CHARACTER, mean_value)) + geom_bar()
#And that should do it.
这是使用dplyr
创建摘要的解决方案。 在这种情况下,摘要是在ggplot
创建的,但是您也可以先创建一个单独的摘要数据框,然后将其提供给ggplot
。
library(dplyr)
library(ggplot2)
ggplot(df %>% group_by(GROUP) %>%
summarise(`Mean NUMERIC1`=mean(NUMERIC1)),
aes(GROUP, `Mean NUMERIC1`)) +
geom_bar(stat="identity", fill=hcl(195,100,65))
由于您是在绘制平均值而不是计数,因此使用点而不是条形可能更有意义。 例如:
ggplot(df %>% group_by(GROUP) %>%
summarise(`Mean NUMERIC1`=mean(NUMERIC1)),
aes(GROUP, `Mean NUMERIC1`)) +
geom_point(pch=21, size=5, fill="blue") +
coord_cartesian(ylim=c(0,310))
当您可以使用自己的代码和barplot进行相同操作时,为什么选择ggplot:
barplot(tapply(df$NUMERIC1, df$GROUP, FUN=mean))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.