[英]Ordering of R geom_bar plot
I have a dataset (1000 IDs, 9 classes) similar to this one: 我有一个与此类似的数据集(1000个ID,9个类):
ID Class Value
1 A 0.014
1 B 0.665
1 C 0.321
2 A 0.234
2 B 0.424
2 C 0.342
... ... ...
The Value
column are (relative) abundances, ie the sum of all classes for one individual equals 1. “ Value
列是(相对)丰度,即,一个人的所有类别的总和等于1。
I would like to create a ggplot geom_bar
plot in R where the x axis is not ordered by IDs but by decreasing class abundance, similar to this one: 我想在R中创建一个ggplot geom_bar
图,其中x轴不是按ID排序,而是通过减少类的丰度来进行排序,类似于此:
In our example, let's say that Class B
is the most abundant class across all individuals, followed by Class C
and finally Class A
, the first bar of the x axis would be for the individual with the highest Class B
, the second bar would the individual with the second highest Class B
, etc. 在我们的示例中,假设Class B
是所有个人中最丰富的类,其次是Class C
,最后是Class A
,x轴的第一个横条是Class B
最高的个人,第二个竖条是Class B
第二高的个人,等等。
This is what I tried: 这是我尝试的:
ggplot(df, aes(x=ID, y=Value, fill=Class)) +
geom_bar(stat="identity") +
xlab("") +
ylab("Relative Abundance\n")
You can do the reordering before passing the result to ggplot()
: 您可以在将结果传递到ggplot()
之前进行重新排序:
library(dplyr)
library(ggplot2)
# sum the abundance for each class, across all IDs, & sort the result
sort.class <- df %>%
count(Class, wt = Value) %>%
arrange(desc(n)) %>%
pull(Class)
# get ID order, sorted by each ID's abundance in the most abundant class
ID.order <- df %>%
filter(Class == sort.class[1]) %>%
arrange(desc(Value)) %>%
pull(ID)
# factor ID / Class in the desired order
df %>%
mutate(ID = factor(ID, levels = ID.order)) %>%
mutate(Class = factor(Class, levels = rev(sort.class))) %>%
ggplot(aes(x = ID, y = Value, fill = Class)) +
geom_col(width = 1) #geom_col is equivalent to geom_bar(stat = "identity")
Sample data: 样本数据:
library(tidyr)
set.seed(1234)
df <- data.frame(
ID = seq(1, 100),
A = sample(seq(2, 3), 100, replace = TRUE),
B = sample(seq(5, 9), 100, replace = TRUE),
C = sample(seq(3, 7), 100, replace = TRUE),
D = sample(seq(1, 2), 100, replace = TRUE)
) %>%
gather(Class, Value, -ID) %>%
group_by(ID) %>%
mutate(Value = Value / sum(Value)) %>%
ungroup() %>%
arrange(ID, Class)
> df
# A tibble: 400 x 3
ID Class Value
<int> <chr> <dbl>
1 1 A 0.143
2 1 B 0.357
3 1 C 0.429
4 1 D 0.0714
5 2 A 0.176
6 2 B 0.412
7 2 C 0.294
8 2 D 0.118
9 3 A 0.2
10 3 B 0.4
# ... with 390 more rows
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.