简体   繁体   中英

Grouped bar plot in ggplot2

I am trying to make a grouped bar chart with data in long form.

Here is the data:

structure(list(group = c("group1", "group2", "group3", "group1", 
"group2", "group1", "group1", "group1", "group4", "group1", "group4", 
"group4", "group1", "group4", "group1", "group1", "group2", "group1", 
"group4", "group2", "group4", "group2", "group3", "group3", "group1", 
"group1", "group3", "group3", "group1", "group1", "group3", "group1", 
"group4", "group3", "group3", "group1", "group2", "group1", "group4", 
"group1", "group3", "group3", "group3", "group2", "group2", "group4", 
"group3", "group3", "group3", "group2", "group3", "group2", "group1", 
"group1", "group3", "group1", "group1", "group2", "group4", "group1", 
"group4", "group1", "group1", "group4", "group1", "group3", "group4", 
"group1", "group4", "group2", "group4", "group1", "group2", "group4", 
"group1", "group4", "group1", "group2", "group1", "group1", "group1", 
"group1", "group2", "group1", "group3", "group1", "group1", "group1", 
"group3", "group4", "group1", "group3", "group1", "group3", "group4", 
"group1", "group2", "group1", "group3", "group1"), category = c("category4", 
"category5", "category2", "category4", "category3", "category6", 
"category3", "category1", "category4", "category2", "category6", 
"category6", "category5", "category5", "category4", "category4", 
"category1", "category6", "category1", "category4", "category6", 
"category6", "category2", "category6", "category3", "category2", 
"category6", "category3", "category6", "category1", "category6", 
"category2", "category2", "category2", "category5", "category1", 
"category1", "category4", "category3", "category4", "category4", 
"category5", "category1", "category3", "category5", "category2", 
"category2", "category5", "category5", "category2", "category6", 
"category6", "category5", "category1", "category4", "category3", 
"category6", "category1", "category6", "category3", "category2", 
"category2", "category3", "category2", "category2", "category5", 
"category4", "category4", "category4", "category4", "category1", 
"category5", "category6", "category5", "category4", "category5", 
"category1", "category2", "category3", "category5", "category3", 
"category2", "category4", "category6", "category4", "category6", 
"category1", "category4", "category4", "category3", "category4", 
"category5", "category5", "category6", "category4", "category3", 
"category5", "category3", "category3", "category1"), count = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 
0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-100L), class = c("tbl_df", "tbl", "data.frame"))

When I run the following:

pivot_sample %>% 
  ggplot(aes(x=group,fill=category))+
  geom_bar()

在此处输入图像描述 The stat_count() default function seems to work just fine with the default position="stack" However, when I switch to position="dodge" in the code below:

pivot_sample %>% 
  ggplot(aes(x=group,y=count,fill=category))+
  geom_bar(position = "dodge",stat = "identity")

在此处输入图像描述 It won't count the count variable.

I am sure there is something basic I am missing and could use another perspective. Do I need to use a count function for the y= argument in the aes() ?

All help would be appreciated!

OP, the simple answer here is just to add position="dodge" to your original plot code and it works fine to separate the bars according to the group aesthetic (which is not specified, so it will default for the bar geom to use the fill aesthetic as the one to group by):

pivot_sample %>%
  ggplot(aes(x=group, fill=category)) +
  geom_bar(position='dodge')

在此处输入图像描述

The reason is that the default option for the stat argument in geom_bar is stat="count" . This will count all the observations and plot along the y axis the "count". To access this you can use the .. notation: ..count.. , but it's not necessary with geom_bar() . So, the code below shows you kind of a long form that shows you the same plot:

pivot_sample %>%
ggplot(aes(x=group, fill=category)) +
  geom_bar(position='dodge', aes(y=..count..), stat="count")

Note that your data frame has a column called "count", but pivot_sample$count is not what is accessed when you specify and use ..count.. . What's being accessed there is the result after the stat="count" function is run.

What happened when you used stat="identity" ? Well, the "identity" stat plots the actual value on the y axis. You specified y=count , which means that the value of the column pivot_sample$count was plotted at each grouping and category. geom_bar with stat="identity" is the same as using geom_col() (which should be used in that case), which will require x and y aesthetics to be defined. In this case, the "identity" will result in adding up all the values of the y aesthetic - or pivot_sample$count .

In your plot you showed using stat="identity" , you are seeing the value of count represented as the bar height equal to the sum of all values of pivot_sample$count for each bar. You don't have a lot of values = 1 for that column in the data, so that's why it looks the way it does.

Note that geom_bar() using stat="count" counts observations , whereas stat="identity" totals the value .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM