简体   繁体   中英

Stacked bar plot with 4 categorical variables in R

My problem is being able to displaying 4 categorical variables in a bar graph in R.

The 4 categorical variables each have 2 or more levels. My thoughts were to use a ggplot to create separate bar plots using geom_bar for each of 3 categories, for which counts of each level would be stacked. I would then use facet_wrap then to split it out by the 4th category.

The data looks like this:

Species     Crown_Class     Life_class      Stem_Category
E. obliqua  Suppressed      Standing live   Large stems
E. rubida   Intermediate    Standing live   Large stems
E. obliqua  Suppressed      Standing live   Small stems
E. obliqua  Suppressed      Standing live   Small stems
E. rubida   Suppressed      Standing live   Large stems
E. radiata  Suppressed      Standing live   Small stems
E. obliqua  Dominant        Standing live   Small stems
E. obliqua  Suppressed      Standing live   Small stems
E. radiata  Suppressed      Standing live   Large stems
E. rubida   NA              Standing dead   Large stems
E. rubida   Intermediate    Standing live   Large stems

The graph I have in mind shows each a stacked bar for each of three categories which are then grouped by a third. For the data given, separate bars for Crown_Class, life_class and Stem_Category would be displayed for each of the species.

I have been trying for hours and can do separate plots using this code (I separated the data into 3 separate dataframes to do it though:

ggplot(data= cc, aes(x= Species, fill = Crown_Class))+
geom_bar(position='stack')

ggplot(data=lc, aes(x = Species, fill = Life_class))+
geom_bar(position ='stack')

ggplot(data=sc, aes(x = Species, fill = Stem_Category))+
geom_bar(position ='stack')

The idea was to do something like this:

ggplot()+
  geom_bar(data= cc, aes(x = Species, fill = Crown_Class), 
      position='stack') +
  geom_bar(data=lc, aes(x = Species, fill = Life_class), 
      position ='dodge')+
  facet_wrap(~Species)

But the result is not what I have in mind. The second plot effectively overwrites the first.

在此处输入图片说明

I would be grateful for any help.

Here's an example of how you could use facet_grid to include all 4 variables on the same plot.

Note that I generate some dummy data, since I had trouble importing your dataset into R .

generate data

library(ggplot2)
theme_set(theme_bw())
set.seed(123)
df1 <- data.frame(s1 = sample(letters[1:3], 11, replace = T),
                  s2 = sample(letters[4:6], 11, replace = T),
                  s3 = sample(letters[7:9], 11, replace = T),
                  s4 = sample(letters[10:12], 11, replace = T),
                  stringsAsFactors = FALSE)

edit:

Maybe this is closer to what you're after:

ggplot(df1)+
    geom_bar(aes(x = s1), position = 'stack')+
    geom_bar(aes(x = s2), position = 'stack')+
    geom_bar(aes(x = s3), position = 'stack')+
    facet_wrap(~ s4)

在此处输入图片说明

If you proceed in this manner, you should definitely note that the values on the x-axis come from three different variables .

IMHO: While I'm no expert on the subject, I do think it's a bit dubious to create a visualization with three different variables on the same axis, and ggplot2 gives you plenty of options to avoid proceeding in such a manner.

make plot using facet_grid

ggplot(df1, aes(x = s1, fill = s2))+
    geom_bar(position = 'stack')+
    facet_grid(s3~s4)

在此处输入图片说明

make plot using interaction and facet_wrap

Now, suppose you don't want the two grouping factors as facets, and just prefer one facet. Then, we can use the interaction function.

ggplot(df1, aes(x = s1, fill = interaction(s2,s3)))+
    geom_bar(position = 'stack')+
    facet_wrap(~s4)

在此处输入图片说明

use Rmisc::multiplot

Finally, we can create three separate plots, and then use Rmisc::multiplot to plot on the same page.

library(Rmisc)
p1 <- ggplot(df1, aes(x = s1, fill = s2))+
    geom_bar(position = 'stack')
p2 <- ggplot(df1, aes(x = s1, fill = s3))+
    geom_bar(position = 'stack')
p3 <- ggplot(df1, aes(x = s1, fill = s4))+
    geom_bar(position = 'stack')

multiplot(p1,p2,p3, cols = 3)

在此处输入图片说明

Since you are trying to differentiate your plots using Crown_Class , Life_class , and Stem_Category , ggplot2 would prefer those values to be in a column of their own (in general ggplot2 like long data, where only one column contains the value being plotted.) We can reorganize the data using tidyr.

library(tidyr)
df <-
  gather(df, variable, value, -Species)

head(df)
     Species    variable           value
1 E. obliqua Crown_Class      Suppressed
2  E. rubida Crown_Class    Intermediate
3 E. obliqua Crown_Class      Suppressed
4 E. obliqua Crown_Class      Suppressed
5  E. rubida Crown_Class      Suppressed
6 E. radiata Crown_Class      Suppressed

Now we can facet wrap on variable

ggplot(df) +
  geom_bar(aes(x = Species, fill = value)) +
  facet_wrap(~ variable) 

在此处输入图片说明

If you dont like having only one guide for all the colors for Crown_Class , Life_class and 'Stem_Category', you can make three separate plots and combine them using the gridExtra package.

library(dplyr)
library(gridExtra)
p <-
  df %>%
  filter(variable == 'Crown_Class') %>%
  ggplot() +
  geom_bar(aes(x = Species, fill = value)) +
  facet_wrap(~ variable)

q <-
  df %>%
  filter(variable == 'Life_class') %>%
  ggplot() +
  geom_bar(aes(x = Species, fill = value)) +
  facet_wrap(~ variable)

r <-
  df %>%
  filter(variable == 'Stem_Category') %>%
  ggplot() +
  geom_bar(aes(x = Species, fill = value)) +
  facet_wrap(~ variable)

grid.arrange(p, q, r)

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM