ggplot geom_bar其中x =多列

Question

如何制作条形图，其中X来自数据框的多个值？

假数据：

data <- data.frame(col1 = rep(c("A", "B", "C", "B", "C", "A", "A", "B", "B", "A", "C")),
                   col2 = rep(c(2012, 2012, 2012, 2013, 2013, 2014, 2014, 2014, 2015, 2015, 2015)), 
                   col3 = rep(c("Up", "Down", "Up", "Up", "Down", "Left", "Right", "Up", "Right", "Down", "Up")),
                   col4 = rep(c("Y", "N", "N", "N", "Y", "N", "Y", "Y", "Y", "N", "Y")))

我要做的是根据col1 ， col2和col3分组， col4中Y和N的数量（理想情况下，百分比）。

总的来说，如果有50行，其中25行有Y ，我应该能够制作如下图：

我知道ggplot的基本条形图是：

ggplot(data, aes(x = col1, fil = col4)) + geom_bar()

我不是在寻找col2每个col3找到多少col4 ，所以facet_wrap()不是技巧，我想，但我不知道该怎么办。

Answer 1

您需要先将数据帧转换为长格式，然后使用创建的变量设置facet_wrap() 。

data_long <- tidyr::gather(data, key = type_col, value = categories, -col4)

ggplot(data_long, aes(x = categories, fill = col4)) +
  geom_bar() + 
  facet_wrap(~ type_col, scales = "free_x")

Answer 2

一个非常粗略的近似，希望它能激发对话和/或给予足够的启动。

你的数据太小而不能做太多，所以我会扩展它。

set.seed(2)
n <- 100
d <- data.frame(
  cat1 = sample(c('A','B','C'), size=n, replace=TRUE),
  cat2 = sample(c(2012L,2013L,2014L,2015L), size=n, replace=TRUE),
  cat3 = sample(c('^','v','<','>'), size=n, replace=TRUE),
  val = sample(c('X','Y'), size=n, replace=TRUE)
)

我在这里使用dplyr和tidyr来重塑数据：

library(ggplot2)
library(dplyr)
library(tidyr)

d %>%
  tidyr::gather(cattype, cat, -val) %>%
  filter(val=="Y") %>%
  head
# Warning: attributes are not identical across measure variables; they will be dropped
#   val cattype cat
# 1   Y    cat1   A
# 2   Y    cat1   A
# 3   Y    cat1   C
# 4   Y    cat1   C
# 5   Y    cat1   B
# 6   Y    cat1   C

接下来的技巧是正确地面对它：

d %>%
  tidyr::gather(cattype, cat, -val) %>%
  filter(val=="Y") %>%
  ggplot(aes(val, fill=cattype)) +
  geom_bar() +
  facet_wrap(~cattype+cat, nrow=1)

Answer 3

根据您的需求，您还可以使用重塑包装中的melt来实现您想要的效果。

（注意：这个解决方案非常类似于菲尔的，如果你把col4变成了填充，你就可以将它转换为仅仅是你的填充，而不是只用“Y”过滤并包括一个小平面包装）

继续您的数据设置：

library(reshape)

#Reshape the data to sort it by all the other column's categories
data$col2 <- as.factor(as.character(data$col2))

breakdown <- melt(data, "col4")

#Our x values are the individual values, e.g. A, 2012, Down.
#Our fill is what we want it grouped by, in this case variable, which is our col1, col2, col3 (default column name from melt)
ggplot(subset(breakdown, col4 == "Y"), aes(x = value, fill = variable)) +
  geom_bar() +
  # scale_x_discrete(drop=FALSE) +
  scale_fill_discrete(labels = c("Letters", "Year", "Direction")) +
  ylab("Number of Yes's")

我不是100％肯定你想要什么，但也许这更像是它？

编辑为了显示Yes的百分比，我们可以使用ddply包中的plyr来创建一个数据框，其中每个变量的百分比为百分比，然后将条形图绘制为值而不是计数。

#The ddply applies a function to a data frame grouped by columns.
#In this case we group by our col1, col2 and col3 as well as the value.
#The function I apply just calculated the percentage, i.e. number of yeses/number of responses
plot_breakdown <- ddply(breakdown, c("variable", "value"), function(x){sum(x$col4 == "Y")/nrow(x)})

#When we plot we not add y = V1 to plot the percentage response
#Also in geom_bar I've now added stat = 'identity' so it doesn't try and plot counts
ggplot(plot_breakdown, aes(x = value, y = V1, fill = variable)) +
  geom_bar(aes(group = factor(variable)), position = "dodge", stat = 'identity') +
  scale_x_discrete(drop=FALSE) +
  scale_fill_discrete(labels = c("Letters", "Year", "Direction")) +
  ylab("Percentage of Yes's") +
  scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25), labels = c("0%", "25%", "50%", "75%", "100%"))

我添加到ggplot的最后一行是让y轴看起来更像百分比-y :)

在您提到的评论中，您希望这样做，因为样本量不同，您希望在类别之间进行某种公平的比较。 我的建议是在这里要小心。 百分比看起来很好，但如果样本量很小，可能会误解。 例如，当你只得到一个回答时，0％的回答是肯定的。 我的建议是要么用您认为样本量太小的列排除列，要么利用色域。

#Adding an extra column using ddply again which generates a 1 if the sample size is less than 3, and a 0 otherwise
plot_breakdown <- cbind(plot_breakdown,
                        too_small = factor(ddply(breakdown, c("variable", "value"), function(x){ifelse(nrow(x)<3,1,0)})[,3]))

#Same ggplot as before, except with a colour variable now too (outside line of bar)
#Because of this I also added a way to customise the colours which display, and the names of the colour legend
    ggplot(plot_breakdown, aes(x = value, y = V1, fill = variable, colour = too_small)) +
  geom_bar(size = 2, position = "dodge", stat = 'identity') +
  scale_x_discrete(drop=FALSE) +
  labs(fill = "Variable", colour = "Too small?") +
  scale_fill_discrete(labels = c("Letters", "Year", "Direction")) +
  scale_colour_manual(values = c("black", "red"), labels = c("3+ response", "< 3 responses")) +
  ylab("Percentage of Yes's") +
  scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25), labels = c("0%", "25%", "50%", "75%", "100%"))

Answer 4

如果您实际上将Y和N分组为其他三列，则每组中将有一个观察。 但是，如果您重复了Y和N，则可以将它们重新编码为1和0，并获得百分比。 这是一个例子：

 library(tidyverse)

 data <- data.frame(col1 = rep(c("A", "B", "C", "B", "C", "A", "A", "B", "B", "A", "C")), 
               col2 = rep(c(2012, 2012, 2012, 2013, 2013, 2014, 2014, 2014, 2015, 2015, 2015)), 
               col3 = rep(c("Up", "Down", "Up", "Up", "Down", "Left", "Right", "Up", "Right", "Down", "Up")), 
               col4 = rep(c("Y", "N", "N", "N", "Y", "N", "Y", "Y", "Y", "N", "Y")))


 data %>%
    dplyr::group_by(col1,col2,col3) %>%
    mutate(col4 = ifelse(col4 == "Y", 1,0)) %>%
    dplyr::summarise(percentage = mean(col4)) %>%
    ggplot(aes(x = col1, y = percentage, color = as.factor(col2), fill = col3)) +
    geom_col(position = position_dodge(width = .5))

ggplot geom_bar其中x =多列

问题描述

4 个解决方案

解决方案1
5 已采纳 2018-02-23 03:39:26

解决方案2
2 2018-02-23 03:27:33

解决方案3
2 2018-02-23 04:03:07

解决方案4
1 2018-02-23 03:29:55

ggplot geom_bar其中x =多列

问题描述

4 个解决方案

解决方案1 5 已采纳 2018-02-23 03:39:26

解决方案2 2 2018-02-23 03:27:33

解决方案3 2 2018-02-23 04:03:07

解决方案4 1 2018-02-23 03:29:55

解决方案1
5 已采纳 2018-02-23 03:39:26

解决方案2
2 2018-02-23 03:27:33

解决方案3
2 2018-02-23 04:03:07

解决方案4
1 2018-02-23 03:29:55