简体   繁体   English

在 dplyr 中分组后将误差线添加到 ggplot2 条形图

[英]Adding error bars to ggplot2 bar plot after group by in dplyr

I have the following data in R.我在 R 中有以下数据。

oligo  condition  score
REF    Sample     27.827
REF    Sample     24.622
REF    Sample     31.042
REF    Competitor 21.066
REF    Competitor 18.413
REF    Competitor 36.164
ALT    Sample     75.465
ALT    Sample     57.058
ALT    Sample     66.408
ALT    Competitor 35.420
ALT    Competitor 17.652
ALT    Competitor 21.466

I have munged this and taken the averages of the scores for each condition using the group_by and summarise functions in dplyr.我已经对此进行了修改,并使用 dplyr 中的group_bysummarise函数计算了每个条件的分数的平均值。

emsa_test <- emsa_1 %>% 
  group_by(oligo,condition) %>%
  summarise_all(mean)

Creating the this table.创建此表。

oligo  condition  score
ALT    Competitor 24.84600
ALT    Sample     66.31033
REF    Competitor 25.21433
REF    Sample     27.83033

I then plotted this using ggplot2.然后我使用 ggplot2 绘制了这个图。

ggplot(emsa_test, aes(oligo, score)) + 
geom_bar(aes(fill = condition), 
         width = 0.4, position = position_dodge(width=0.5), color = "black", stat="identity", size=.3) +  
theme_bw() +
ggtitle("CEBP\u03b1") +
theme(plot.title = element_text(size = 40, face = "bold", hjust = 0.5)) +
scale_fill_manual(values = c("#d8b365", "#f5f5f5"))

My issue is that I need to add error bars to the plot.我的问题是我需要在图中添加误差线。 The implementation would be similar to this.实现将与此类似。

geom_errorbar(aes(ymin=len-se, ymax=len+se), width=.1, position=pd)

However the after the data is munged, the max and min info contained in table 1 is lost.然而,在数据被修改后,表 1 中包含的最大值和最小值信息将丢失。 I could add the error bars manually but I have a few plots to plot so wonder if there is a way to retain this info through the pipeline.我可以手动添加误差线,但我有一些要绘制的图,所以想知道是否有办法通过管道保留这些信息。

Many Thanks.非常感谢。

You can calculate the components on the fly with dplyr like this: 您可以使用dplyr即时计算组件:

library(tidyverse)

df <- read_table(
"oligo  condition  score
REF    Sample     27.827
REF    Sample     24.622
REF    Sample     31.042
REF    Competitor 21.066
REF    Competitor 18.413
REF    Competitor 36.164
ALT    Sample     75.465
ALT    Sample     57.058
ALT    Sample     66.408
ALT    Competitor 35.420
ALT    Competitor 17.652
ALT    Competitor 21.466"
)

df %>%
  group_by(oligo, condition) %>%
  summarise(
    mean = mean(score),
    sd = sd(score),
    n = n(),
    se = sd / n
  ) %>%
  ggplot(aes(x = oligo, y = mean, fill = condition)) +
  geom_col(position = position_dodge()) +
  geom_errorbar(
    aes(ymin = mean - se, ymax = mean + se), 
    position = position_dodge2(padding = 0.5)
  ) +
  labs(
    title = "Mean Score ± 1 SE"
  )

Created on 2019-04-01 by the reprex package (v0.2.1) reprex软件包 (v0.2.1)创建于2019-04-01

You can summarize to more than one value and preserve min max and mean : 您可以汇总多个值并保留min maxmean

emsa_test <- emsa_1 %>% 
  group_by(oligo,condition) %>%
  summarise(mean=mean(score),min=min(score),max=max(score))

没有足够的声誉来评论,但只是注意到 JasonAizkalns 的回答中的一个错误,以防其他人简单地复制代码:se = sd/sqrt(n)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM