繁体   English   中英

ggplot geom_bar其中x =多列

[英]ggplot geom_bar where x = multiple columns

如何制作条形图,其中X来自数据框的多个值?

假数据:

data <- data.frame(col1 = rep(c("A", "B", "C", "B", "C", "A", "A", "B", "B", "A", "C")),
                   col2 = rep(c(2012, 2012, 2012, 2013, 2013, 2014, 2014, 2014, 2015, 2015, 2015)), 
                   col3 = rep(c("Up", "Down", "Up", "Up", "Down", "Left", "Right", "Up", "Right", "Down", "Up")),
                   col4 = rep(c("Y", "N", "N", "N", "Y", "N", "Y", "Y", "Y", "N", "Y")))

我要做的是根据col1col2col3分组, col4YN的数量(理想情况下,百分比)。

总的来说,如果有50行,其中25行有Y ,我应该能够制作如下图:

条状图

我知道ggplot的基本条形图是:

ggplot(data, aes(x = col1, fil = col4)) + geom_bar()

我不是在寻找col2每个col3找到多少col4 ,所以facet_wrap()不是技巧,我想,但我不知道该怎么办。

您需要先将数据帧转换为长格式,然后使用创建的变量设置facet_wrap()

data_long <- tidyr::gather(data, key = type_col, value = categories, -col4)

ggplot(data_long, aes(x = categories, fill = col4)) +
  geom_bar() + 
  facet_wrap(~ type_col, scales = "free_x")

在此输入图像描述

一个非常粗略的近似,希望它能激发对话和/或给予足够的启动。

你的数据太小而不能做太多,所以我会扩展它。

set.seed(2)
n <- 100
d <- data.frame(
  cat1 = sample(c('A','B','C'), size=n, replace=TRUE),
  cat2 = sample(c(2012L,2013L,2014L,2015L), size=n, replace=TRUE),
  cat3 = sample(c('^','v','<','>'), size=n, replace=TRUE),
  val = sample(c('X','Y'), size=n, replace=TRUE)
)

我在这里使用dplyrtidyr来重塑数据:

library(ggplot2)
library(dplyr)
library(tidyr)

d %>%
  tidyr::gather(cattype, cat, -val) %>%
  filter(val=="Y") %>%
  head
# Warning: attributes are not identical across measure variables; they will be dropped
#   val cattype cat
# 1   Y    cat1   A
# 2   Y    cat1   A
# 3   Y    cat1   C
# 4   Y    cat1   C
# 5   Y    cat1   B
# 6   Y    cat1   C

接下来的技巧是正确地面对它:

d %>%
  tidyr::gather(cattype, cat, -val) %>%
  filter(val=="Y") %>%
  ggplot(aes(val, fill=cattype)) +
  geom_bar() +
  facet_wrap(~cattype+cat, nrow=1)

在此输入图像描述

根据您的需求,您还可以使用重塑包装中的melt来实现您想要的效果。

(注意:这个解决方案非常类似于菲尔的,如果你把col4变成了填充,你就可以将它转换为仅仅是你的填充,而不是只用“Y”过滤并包括一个小平面包装)

继续您的数据设置:

library(reshape)

#Reshape the data to sort it by all the other column's categories
data$col2 <- as.factor(as.character(data$col2))

breakdown <- melt(data, "col4")

#Our x values are the individual values, e.g. A, 2012, Down.
#Our fill is what we want it grouped by, in this case variable, which is our col1, col2, col3 (default column name from melt)
ggplot(subset(breakdown, col4 == "Y"), aes(x = value, fill = variable)) +
  geom_bar() +
  # scale_x_discrete(drop=FALSE) +
  scale_fill_discrete(labels = c("Letters", "Year", "Direction")) +
  ylab("Number of Yes's")

在此输入图像描述

我不是100%肯定你想要什么,但也许这更像是它?

编辑为了显示Yes的百分比,我们可以使用ddply包中的plyr来创建一个数据框,其中每个变量的百分比为百分比,然后将条形图绘制为值而不是计数。

#The ddply applies a function to a data frame grouped by columns.
#In this case we group by our col1, col2 and col3 as well as the value.
#The function I apply just calculated the percentage, i.e. number of yeses/number of responses
plot_breakdown <- ddply(breakdown, c("variable", "value"), function(x){sum(x$col4 == "Y")/nrow(x)})

#When we plot we not add y = V1 to plot the percentage response
#Also in geom_bar I've now added stat = 'identity' so it doesn't try and plot counts
ggplot(plot_breakdown, aes(x = value, y = V1, fill = variable)) +
  geom_bar(aes(group = factor(variable)), position = "dodge", stat = 'identity') +
  scale_x_discrete(drop=FALSE) +
  scale_fill_discrete(labels = c("Letters", "Year", "Direction")) +
  ylab("Percentage of Yes's") +
  scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25), labels = c("0%", "25%", "50%", "75%", "100%"))

我添加到ggplot的最后一行是让y轴看起来更像百分比-y :)

在此输入图像描述

在您提到的评论中,您希望这样做,因为样本量不同,您希望在类别之间进行某种公平的比较。 我的建议是在这里要小心。 百分比看起来很好,但如果样本量很小,可能会误解。 例如,当你只得到一个回答时,0%的回答是肯定的。 我的建议是要么用您认为样本量太小的列排除列,要么利用色域。

#Adding an extra column using ddply again which generates a 1 if the sample size is less than 3, and a 0 otherwise
plot_breakdown <- cbind(plot_breakdown,
                        too_small = factor(ddply(breakdown, c("variable", "value"), function(x){ifelse(nrow(x)<3,1,0)})[,3]))

#Same ggplot as before, except with a colour variable now too (outside line of bar)
#Because of this I also added a way to customise the colours which display, and the names of the colour legend
    ggplot(plot_breakdown, aes(x = value, y = V1, fill = variable, colour = too_small)) +
  geom_bar(size = 2, position = "dodge", stat = 'identity') +
  scale_x_discrete(drop=FALSE) +
  labs(fill = "Variable", colour = "Too small?") +
  scale_fill_discrete(labels = c("Letters", "Year", "Direction")) +
  scale_colour_manual(values = c("black", "red"), labels = c("3+ response", "< 3 responses")) +
  ylab("Percentage of Yes's") +
  scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25), labels = c("0%", "25%", "50%", "75%", "100%"))

在此输入图像描述

如果您实际上将Y和N分组为其他三列,则每组中将有一个观察。 但是,如果您重复了Y和N,则可以将它们重新编码为1和0,并获得百分比。 这是一个例子:

 library(tidyverse)

 data <- data.frame(col1 = rep(c("A", "B", "C", "B", "C", "A", "A", "B", "B", "A", "C")), 
               col2 = rep(c(2012, 2012, 2012, 2013, 2013, 2014, 2014, 2014, 2015, 2015, 2015)), 
               col3 = rep(c("Up", "Down", "Up", "Up", "Down", "Left", "Right", "Up", "Right", "Down", "Up")), 
               col4 = rep(c("Y", "N", "N", "N", "Y", "N", "Y", "Y", "Y", "N", "Y")))


 data %>%
    dplyr::group_by(col1,col2,col3) %>%
    mutate(col4 = ifelse(col4 == "Y", 1,0)) %>%
    dplyr::summarise(percentage = mean(col4)) %>%
    ggplot(aes(x = col1, y = percentage, color = as.factor(col2), fill = col3)) +
    geom_col(position = position_dodge(width = .5))

例

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM