简体   繁体   中英

Plot per-group means over geom_bar w

I have a data frame with three columns: a factor (representing here a chapter in a book), a numerical ID (representing where the sentence occurs in the book), and a value (representing the number of words in the book). It looks something like this:

sentence.length
# A tibble: 5,368 x 3
   Chapter    ID Length
   <fct>   <dbl>  <dbl>
 1 1           1    294
 2 1           2     19
 3 1           3     77
 4 1           4     57
 5 1           5     18
 6 1           6     18
 7 1           7     27
 8 1           8     56
 9 1           9     32
10 1          10     25
# ... with 5,358 more rows

I have a plot that is very close to what I want.

ggplot(data,aes(x=ID,y=Length,fill=Chapter)) +
  geom_bar(stat='identity') 

一个 ggplot 图

What I'd like to add is, over every group, is a horizontal line representing the mean of that group.

This code, modified from another question, gets me close

  stat_summary(fun.y = mean, aes(x = 1, yintercept = ..y.., group = Chapter), geom = "hline")

But the lines extend across the entire plot; is there a way to plot that mean line only over the relevant portion of the plot? I suspect the issue here is that my data happens to be ordered such that a group corresponds to a continuous part of the plot; but there is nothing in the aesthetics of the plot itself to require this.

An even closer approach is to use not stat_summary but geom_smooth ; geom_smooth(method='lm',se=FALSE) gets me really close. But rather than a linear regression, I really just want the mean for the group (here, the per-chapter sentence length mean).

使用 geom_smooth 的 ggplot

Is there a better/simpler approach?

I'm not sure if it's the simplest way to do this, but it works:

在此处输入图片说明

library(tidyverse)
library(wrapr)

df %.>%
  ggplot(data = ., aes(
    x = ID,
    y = Length,
    fill = Chapter
  )) +
  geom_col() +
  geom_segment(data = group_by(., Chapter) %>%
    summarise(
      mean_len = mean(Length),
      min_id = min(ID),
      max_id = max(ID)
    ),
    aes(
      x = min_id,
      xend = max_id,
      y = mean_len,
      yend = mean_len
    ),
    color = 'steelblue',
    size = 1.2
  )

With %.>% pipe you can pass down df to summarise it in geom_segment function. You can access df after %.>% by . .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM