[英]R dplyr summarise mean and stdev using group_by
I have a dataframe that looks like this:我有一个看起来像这样的数据框:
df <- data.frame("Experiment" = c(rep("Exp1", 6), rep("Exp2", 5), rep("Exp3", 4)),
"Replicate" = c("A","A","A","B","C","C","A","A","B","B","C","A","B","B","C"),
"Type" = c("alpha","beta","gamma","alpha","alpha","beta","alpha","gamma","beta","gamma","beta","alpha","alpha","gamma","beta"),
"Frequency" = c(10,100,1000,15,5,105,10,1010,95,1020,105,15,10,990,100))
I'm trying to calculate mean and stdev of Frequency
for combination of Experiment
and Type
, and I first tried it by running this line:我正在尝试计算
Experiment
和Type
组合的Frequency
的平均值和标准偏差,我首先通过运行以下行来尝试它:
df %>% group_by(Experiment, Type) %>% summarise(mean = mean(Frequency), sd = sd(Frequency)
If I run this, I get a tibble that looks like below:如果我运行它,我会得到一个如下所示的小标题:
Experiment Type mean sd
Exp1 alpha 10 5
Exp1 beta 102. 3.54
Epx1 gamma 1000 NA
But I'd like R to think that all Type
( alpha
, beta
, gamma
) should exist for every combination of Experiment
and Replicate
, so that if there is no Frequency
value for Type
, R will use 0
instead of not including that value.但我希望 R 认为对于
Experiment
和Replicate
的每个组合都应该存在所有Type
( alpha
, beta
, gamma
),因此如果Type
没有Frequency
值,R 将使用0
而不是不包括该值。
In other words, what I want needs to be calculated like below:换句话说,我想要的计算如下:
Experiment Type mean sd
Exp1 alpha mean(10,15,5) sd(10,15,5)
Exp1 beta mean(100,0,105) sd(100,0,105)
Exp1 gamma mean(1000,0,0) sd(1000,0,0)
For example, for Exp1
beta
, the summarise
function I used above calculates mean(100,105)
and sd(100,105)
because Exp1
Replicate B
doesn't exist in my df
.例如,对于
Exp1
beta
,我上面使用的summarise
函数计算mean(100,105)
和sd(100,105)
因为Exp1
Replicate B
不存在于我的df
中。 But I want R to calculate mean(100,0,105)
and sd(100,0,105)
instead.但我希望 R 计算
mean(100,0,105)
和sd(100,0,105)
。 Would anyone be able to give me some ideas on how to do this?谁能给我一些关于如何做到这一点的想法?
You need to first complete
your dataframe to fill in missing data with 0, then pipe the "completed" dataframe to your functions.您需要首先
complete
数据框以用 0 填充缺失的数据,然后将“已完成”的数据框通过管道传输到您的函数。
library(tidyverse)
df %>%
complete(Experiment, Type, Replicate, fill = list(Frequency = 0)) %>%
group_by(Experiment, Type) %>%
summarise(mean = mean(Frequency), sd = sd(Frequency), .groups = "drop")
# A tibble: 9 × 4
Experiment Type mean sd
<chr> <chr> <dbl> <dbl>
1 Exp1 alpha 10 5
2 Exp1 beta 68.3 59.2
3 Exp1 gamma 333. 577.
4 Exp2 alpha 3.33 5.77
5 Exp2 beta 66.7 58.0
6 Exp2 gamma 677. 586.
7 Exp3 alpha 8.33 7.64
8 Exp3 beta 33.3 57.7
9 Exp3 gamma 330 572.
You need to include Replicate
in the group_by
function and conver the output into a wider tibble.您需要在
group_by
函数中包含Replicate
并将输出转换为更宽的 tibble。 The number columns can be mutated by replacing NA values.数字列可以通过替换 NA 值来改变。 Then, concatenating the mean and sd columns would give the desired output.
然后,连接 mean 和 sd 列将给出所需的输出。
df %>% group_by(Experiment, Type, Replicate) %>%
summarise(mean = mean(Frequency), sd = sd(Frequency)) %>%
pivot_wider(names_from = Replicate, values_from = c(mean, sd)) %>%
mutate(across(where(is.double),~ replace_na(.,0))) %>%
mutate(mean = paste0("mean(", mean_A, ",", mean_B, ",", mean_C, ")"),
sd = paste0("sd(", sd_A, ",", sd_B, ",", sd_C, ")")) %>%
select(Experiment, Type, mean, sd)
The output is输出是
# A tibble: 9 x 4
# Groups: Experiment, Type [9]
Experiment Type mean sd
<chr> <chr> <chr> <chr>
1 Exp1 alpha mean(10,15,5) sd(0,0,0)
2 Exp1 beta mean(100,0,105) sd(0,0,0)
3 Exp1 gamma mean(1000,0,0) sd(0,0,0)
4 Exp2 alpha mean(10,0,0) sd(0,0,0)
5 Exp2 beta mean(0,95,105) sd(0,0,0)
6 Exp2 gamma mean(1010,1020,0) sd(0,0,0)
7 Exp3 alpha mean(15,10,0) sd(0,0,0)
8 Exp3 beta mean(0,0,100) sd(0,0,0)
9 Exp3 gamma mean(0,990,0) sd(0,0,0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.