简体   繁体   中英

Issues with getting correct fill color using ggplot2 geom_bar() when plotting two groups on the same graph

I'm having an issue with geom_bar() in ggplot2, where the colors of the bars are not correctly set based on the group a datapoint belongs to, but instead the higher datapoint is always one color, and the lower always the other color.

Imagine I have two groups, black group and blue group. I want to plot the distribution of the number of pizzas eaten by members of the group. So, I have a table which lists, for every number of pizzas, the % of all people in the group who ate that number.

When I plot this using geom_point(), everything is colored correctly.

However, when I plot it using geom_bar(), for some reason the larger bar is always colored black, even if it should be blue. I'm extremely puzzled with what is going wrong here -- how do I get the bars to display the correct color? Example data as well as code to reproduce my problem is below, plus pictures of the two graphs I'm talking about.

library(ggplot2)

data = data.frame(structure(list(pizzas = c(0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 
                                             6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12), color = c("black", 
                                                                                                        "blue", "black", "blue", "black", "blue", "black", "blue", "black", 
                                                                                                        "blue", "black", "blue", "black", "blue", "black", "blue", "black", 
                                                                                                        "blue", "black", "blue", "black", "blue", "black", "blue", "black", 
                                                                                                        "blue"), value = c(0.346153846153846, 0.234042553191489, 0.153846153846154, 
                                                                                                                           0.148936170212766, 0.115384615384615, 0.106382978723404, 0.153846153846154, 
                                                                                                                           0.127659574468085, 0.0192307692307692, 0.0638297872340425, 0.0576923076923077, 
                                                                                                                           0.127659574468085, 0.0576923076923077, 0.0851063829787234, 0.0384615384615385, 
                                                                                                                           0.0425531914893617, 0.0384615384615385, 0, 0, 0, 0, 0, 0, 0.0425531914893617, 
                                                                                                                           0.0192307692307692, 0.0212765957446809)), row.names = c(NA, -26L
                                                                                                                           ), class = c("tbl_df", "tbl", "data.frame")))

#This colors things correctly
ggplot(data=data, aes(x = pizzas, y=value, color = color)) +
  scale_color_manual(values=c('black', 'blue')) +
  geom_point(size=3) +
  ylab("Percent frequency") +
  xlab("Number pizzas eaten")

#This colors things incorrectly, with the higher bar always being black
ggplot(data=data, aes(x = pizzas, y=value, fill = color)) +
  scale_fill_manual(values=c('black', 'blue')) +
  geom_bar(alpha=.5, stat='identity') +
  ylab("Percent frequency") +
  xlab("Number pizzas eaten")

This is what the plot looks like using geom_point -- stuff is colored correctly: 在此处输入图像描述 And this is what the plot looks like using geom_bar -- for some reason the higher bar is always black! 在此处输入图像描述

Your second plot is actually making a stacked bar chart, so the blue part is the value for blue, and the black part on top is the black part, and the total height is the sum. I'm not sure what your intention is, but perhaps you wanted to show the values of blue and black side-by-side? If so, you can accomplish this with position = "dodge" in your code as follows.

ggplot(data=data, aes(x = pizzas, y=value, fill = color)) +
  scale_fill_manual(values=c('black', 'blue')) +
  geom_bar(alpha=.5, stat='identity', position = "dodge") +
  ylab("Percent frequency") +
  xlab("Number pizzas eaten")

在此处输入图像描述

Update: the problem was that geom_bar() stacks the bars by default and I need to set a position argument to stop this behavior...

Solution is

ggplot(data=data, aes(x = pizzas, y=value, fill = color)) +
  scale_fill_manual(values=c('black', 'blue')) +
  geom_bar(alpha=.5, stat='identity', position = "dodge") +
  ylab("Percent frequency") +
  xlab("Number pizzas eaten")

Which gives: 在此处输入图像描述

Or else this if I want them literally overlaid on eachother using the width argument to position_dodge():

ggplot(data=data, aes(x = pizzas, y=value, fill = color)) +
  scale_fill_manual(values=c('black', 'blue')) +
  geom_bar(alpha=.5, stat='identity', position = position_dodge(width=0)) +
  ylab("Percent frequency") +
  xlab("Number pizzas eaten")

Which gives: 在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM