[英]Plot per-group means over geom_bar w
I have a data frame with three columns: a factor (representing here a chapter in a book), a numerical ID (representing where the sentence occurs in the book), and a value (representing the number of words in the book).我有一个包含三列的数据框:一个因子(在这里代表书中的一章)、一个数字 ID(代表句子在书中出现的位置)和一个值(代表书中的单词数)。 It looks something like this:它看起来像这样:
sentence.length
# A tibble: 5,368 x 3
Chapter ID Length
<fct> <dbl> <dbl>
1 1 1 294
2 1 2 19
3 1 3 77
4 1 4 57
5 1 5 18
6 1 6 18
7 1 7 27
8 1 8 56
9 1 9 32
10 1 10 25
# ... with 5,358 more rows
I have a plot that is very close to what I want.我有一个非常接近我想要的情节。
ggplot(data,aes(x=ID,y=Length,fill=Chapter)) +
geom_bar(stat='identity')
What I'd like to add is, over every group, is a horizontal line representing the mean of that group.我想补充的是,在每个组上,是一条代表该组平均值的水平线。
This code, modified from another question, gets me close这段代码,从另一个问题修改,让我接近
stat_summary(fun.y = mean, aes(x = 1, yintercept = ..y.., group = Chapter), geom = "hline")
But the lines extend across the entire plot;但是这些线延伸到整个地块; is there a way to plot that mean line only over the relevant portion of the plot?有没有办法只在绘图的相关部分绘制平均线? I suspect the issue here is that my data happens to be ordered such that a group
corresponds to a continuous part of the plot;我怀疑这里的问题是我的数据恰好被排序,使得一个group
对应于情节的连续部分; but there is nothing in the aesthetics of the plot itself to require this.但情节本身的美学没有任何要求。
An even closer approach is to use not stat_summary
but geom_smooth
;更接近的方法是不使用stat_summary
而是使用geom_smooth
; geom_smooth(method='lm',se=FALSE)
gets me really close. geom_smooth(method='lm',se=FALSE)
让我非常接近。 But rather than a linear regression, I really just want the mean for the group (here, the per-chapter sentence length mean).但不是线性回归,我真的只想要组的平均值(这里是每章句子长度的平均值)。
Is there a better/simpler approach?有没有更好/更简单的方法?
I'm not sure if it's the simplest way to do this, but it works:我不确定这是否是最简单的方法,但它有效:
library(tidyverse)
library(wrapr)
df %.>%
ggplot(data = ., aes(
x = ID,
y = Length,
fill = Chapter
)) +
geom_col() +
geom_segment(data = group_by(., Chapter) %>%
summarise(
mean_len = mean(Length),
min_id = min(ID),
max_id = max(ID)
),
aes(
x = min_id,
xend = max_id,
y = mean_len,
yend = mean_len
),
color = 'steelblue',
size = 1.2
)
With %.>%
pipe you can pass down df
to summarise it in geom_segment
function.使用%.>%
管道,您可以向下传递df
以在geom_segment
函数中geom_segment
进行汇总。 You can access df
after %.>%
by .
您可以在%.>%
之后访问df
.
. .
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.