简体   繁体   中英

How do I create a grouped percent plot in R using ggplot?

The code below produces a sample dataset -

x = c(rep("Category1",5), rep("Category2", 8), rep("Category3", 7))
y = c(rep("No", 2), rep("I don't know", 1), rep("Yes", 2), rep("No", 2), rep("I don't know", 2), rep("Yes", 2), rep("No", 2), rep("I don't know", 3), rep("Yes", 2),rep("No", 2))
df = data.frame(x,y)
colnames(df) = c("Category", "Response")

I am trying to get percentage plots (using ggplot) of the "Response" column. So far I have been able to get a prop table of "Response" using -

total = df %>% 
  count(Response) %>% 
  mutate(prop=prop.table(n))
total

to get

 Response n prop
1 I don't know 6  0.3
2           No 8  0.4
3          Yes 6  0.3

Now I would like to plot a bar chart of those percentages for Yes, No and I don't Know, and label the bars with those percentages.

Next I would like to plot the same thing, but this time grouped by "Category". I have used the code below -

df1 = df %>% 
  count(Category, Response) %>% 
  group_by(Category) %>% 
  mutate(prop=prop.table(n))
df1

to get this output -

Category  Response         n  prop
  <chr>     <chr>        <int> <dbl>
1 Category1 I don't know     1 0.2  
2 Category1 No               2 0.4  
3 Category1 Yes              2 0.4  
4 Category2 I don't know     2 0.25 
5 Category2 No               4 0.5  
6 Category2 Yes              2 0.25 
7 Category3 I don't know     3 0.429
8 Category3 No               2 0.286
9 Category3 Yes              2 0.286

As you can see now the prop is a percentage of the Category. I am new to ggplot and all I seem to be doing is plotting the actual counts as opposed to the percentage of Category. I have started with -

ggplot(df, aes(Category))+
  geom_bar()

However I can't seem to understand how to change the aesthetics to percent.

在此处输入图片说明

I think you should use the proportions you have calculated in df1 and plot using geom_col , which is equivalent to geom_bar(stat = "identity") , meaning the bar heights will be given by the y values you pass rather than the counts.

By default, the percentages within each category will be stacked on top of each other, so you can pick them out using a fill aesthetic.

It's often helpful to write in the actual percentages too using geom_text , but this is not everyone's cup of tea, so you can remove the geom_text call if you like and the rest of the plot will stay the same.

Finally, since we want to show percentages rather than proportions, we use scales::percent as shorthand way of labelling the y axis.

ggplot(df1, aes(Category, prop, fill = Response))+
  geom_col() +
  geom_text(aes(label = scales::percent(prop)),
            position = position_stack(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent)

在此处输入图片说明

You can change the styling, including colors, background and positions:

ggplot(df1, aes(Category, prop, fill = Response))+
  geom_col(width = 0.8, position = position_dodge(width = 0.8),
           color = "forestgreen") +
  geom_text(aes(y = prop/2, label = scales::percent(prop)), 
            position = position_dodge(width = 0.8)) +
  scale_y_continuous(labels = scales::percent) +
  theme_classic() +
  scale_fill_brewer(palette = "Greens") +
  theme(panel.grid.minor = element_line(),
        panel.grid.major = element_line())

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM