Ordering of R geom_bar plot

Question

I have a dataset (1000 IDs, 9 classes) similar to this one:

ID     Class     Value
1      A         0.014
1      B         0.665
1      C         0.321
2      A         0.234
2      B         0.424
2      C         0.342
...    ...       ...

The Value column are (relative) abundances, ie the sum of all classes for one individual equals 1.

I would like to create a ggplot geom_bar plot in R where the x axis is not ordered by IDs but by decreasing class abundance, similar to this one:

In our example, let's say that Class B is the most abundant class across all individuals, followed by Class C and finally Class A , the first bar of the x axis would be for the individual with the highest Class B , the second bar would the individual with the second highest Class B , etc.

This is what I tried:

ggplot(df, aes(x=ID, y=Value, fill=Class)) +
  geom_bar(stat="identity") +
  xlab("") +
  ylab("Relative Abundance\n")

Answer 1

You can do the reordering before passing the result to ggplot() :

library(dplyr)
library(ggplot2)

# sum the abundance for each class, across all IDs, & sort the result
sort.class <- df %>% 
  count(Class, wt = Value) %>%
  arrange(desc(n)) %>%
  pull(Class)

# get ID order, sorted by each ID's abundance in the most abundant class
ID.order <- df %>%
  filter(Class == sort.class[1]) %>%
  arrange(desc(Value)) %>%
  pull(ID)

# factor ID / Class in the desired order
df %>%
  mutate(ID = factor(ID, levels = ID.order)) %>%
  mutate(Class = factor(Class, levels = rev(sort.class))) %>%
  ggplot(aes(x = ID, y = Value, fill = Class)) +
  geom_col(width = 1) #geom_col is equivalent to geom_bar(stat = "identity")

Sample data:

library(tidyr)

set.seed(1234)
df <- data.frame(
  ID = seq(1, 100),
  A = sample(seq(2, 3), 100, replace = TRUE),
  B = sample(seq(5, 9), 100, replace = TRUE),
  C = sample(seq(3, 7), 100, replace = TRUE),
  D = sample(seq(1, 2), 100, replace = TRUE)
) %>%
  gather(Class, Value, -ID) %>%
  group_by(ID) %>%
  mutate(Value = Value / sum(Value)) %>%
  ungroup() %>% 
  arrange(ID, Class)

> df
# A tibble: 400 x 3
      ID Class  Value
   <int> <chr>  <dbl>
 1     1 A     0.143 
 2     1 B     0.357 
 3     1 C     0.429 
 4     1 D     0.0714
 5     2 A     0.176 
 6     2 B     0.412 
 7     2 C     0.294 
 8     2 D     0.118 
 9     3 A     0.2   
10     3 B     0.4   
# ... with 390 more rows

Ordering of R geom_bar plot

Question

1 answers

solution1
1 ACCPTED 2018-09-28 06:55:27

Ordering of R geom_bar plot

Question

1 answers

solution1 1 ACCPTED 2018-09-28 06:55:27

solution1
1 ACCPTED 2018-09-28 06:55:27