简体   繁体   English

R dplyr 使用 group_by 汇总平均值和标准差

[英]R dplyr summarise mean and stdev using group_by

I have a dataframe that looks like this:我有一个看起来像这样的数据框:

df <- data.frame("Experiment" = c(rep("Exp1", 6), rep("Exp2", 5), rep("Exp3", 4)),
                 "Replicate" = c("A","A","A","B","C","C","A","A","B","B","C","A","B","B","C"),
                 "Type" = c("alpha","beta","gamma","alpha","alpha","beta","alpha","gamma","beta","gamma","beta","alpha","alpha","gamma","beta"),
                 "Frequency" = c(10,100,1000,15,5,105,10,1010,95,1020,105,15,10,990,100))

I'm trying to calculate mean and stdev of Frequency for combination of Experiment and Type , and I first tried it by running this line:我正在尝试计算ExperimentType组合的Frequency的平均值和标准偏差,我首先通过运行以下行来尝试它:

df %>% group_by(Experiment, Type) %>% summarise(mean = mean(Frequency), sd = sd(Frequency)

If I run this, I get a tibble that looks like below:如果我运行它,我会得到一个如下所示的小标题:

Experiment   Type   mean   sd
Exp1         alpha  10     5
Exp1         beta   102.   3.54
Epx1         gamma  1000   NA

But I'd like R to think that all Type ( alpha , beta , gamma ) should exist for every combination of Experiment and Replicate , so that if there is no Frequency value for Type , R will use 0 instead of not including that value.但我希望 R 认为对于ExperimentReplicate的每个组合都应该存在所有Typealphabetagamma ),因此如果Type没有Frequency值,R 将使用0而不是不包括该值。

In other words, what I want needs to be calculated like below:换句话说,我想要的计算如下:

Experiment   Type   mean              sd
Exp1         alpha  mean(10,15,5)     sd(10,15,5)
Exp1         beta   mean(100,0,105)   sd(100,0,105)
Exp1         gamma  mean(1000,0,0)    sd(1000,0,0)

For example, for Exp1 beta , the summarise function I used above calculates mean(100,105) and sd(100,105) because Exp1 Replicate B doesn't exist in my df .例如,对于Exp1 beta ,我上面使用的summarise函数计算mean(100,105)sd(100,105)因为Exp1 Replicate B不存在于我的df中。 But I want R to calculate mean(100,0,105) and sd(100,0,105) instead.但我希望 R 计算mean(100,0,105)sd(100,0,105) Would anyone be able to give me some ideas on how to do this?谁能给我一些关于如何做到这一点的想法?

You need to first complete your dataframe to fill in missing data with 0, then pipe the "completed" dataframe to your functions.您需要首先complete数据框以用 0 填充缺失的数据,然后将“已完成”的数据框通过管道传输到您的函数。

library(tidyverse)

df %>% 
  complete(Experiment, Type, Replicate, fill = list(Frequency = 0)) %>% 
  group_by(Experiment, Type) %>% 
  summarise(mean = mean(Frequency), sd = sd(Frequency), .groups = "drop")

# A tibble: 9 × 4
  Experiment Type    mean     sd
  <chr>      <chr>  <dbl>  <dbl>
1 Exp1       alpha  10      5   
2 Exp1       beta   68.3   59.2 
3 Exp1       gamma 333.   577.  
4 Exp2       alpha   3.33   5.77
5 Exp2       beta   66.7   58.0 
6 Exp2       gamma 677.   586.  
7 Exp3       alpha   8.33   7.64
8 Exp3       beta   33.3   57.7 
9 Exp3       gamma 330    572.  

You need to include Replicate in the group_by function and conver the output into a wider tibble.您需要在group_by函数中包含Replicate并将输出转换为更宽的 tibble。 The number columns can be mutated by replacing NA values.数字列可以通过替换 NA 值来改变。 Then, concatenating the mean and sd columns would give the desired output.然后,连接 mean 和 sd 列将给出所需的输出。

df %>% group_by(Experiment, Type, Replicate) %>% 
  summarise(mean = mean(Frequency), sd = sd(Frequency)) %>% 
  pivot_wider(names_from = Replicate, values_from =  c(mean, sd)) %>% 
mutate(across(where(is.double),~ replace_na(.,0))) %>% 
  mutate(mean = paste0("mean(", mean_A, ",", mean_B, ",", mean_C, ")"),
         sd = paste0("sd(", sd_A, ",", sd_B, ",", sd_C, ")")) %>% 
  select(Experiment, Type, mean, sd)

The output is输出是

# A tibble: 9 x 4
# Groups:   Experiment, Type [9]
  Experiment Type  mean              sd       
  <chr>      <chr> <chr>             <chr>    
1 Exp1       alpha mean(10,15,5)     sd(0,0,0)
2 Exp1       beta  mean(100,0,105)   sd(0,0,0)
3 Exp1       gamma mean(1000,0,0)    sd(0,0,0)
4 Exp2       alpha mean(10,0,0)      sd(0,0,0)
5 Exp2       beta  mean(0,95,105)    sd(0,0,0)
6 Exp2       gamma mean(1010,1020,0) sd(0,0,0)
7 Exp3       alpha mean(15,10,0)     sd(0,0,0)
8 Exp3       beta  mean(0,0,100)     sd(0,0,0)
9 Exp3       gamma mean(0,990,0)     sd(0,0,0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM