[英]How calculate the mean and sd of FPKM gene counts by group and combind the mean and sd as dataframe?
Luckly, the first step of calculating the mean
and sd
by group has been finished.幸运的是,已经完成了按组计算
mean
和sd
的第一步。 Now I have the mean
and sd
result respectively.现在我分别得到了
mean
和sd
结果。 And what I wanna do is how to combind theme togather.而我想做的是如何将主题结合在一起。 No matter how easy or difficult the combination method but should the combination dataframe be simple or not complicated.
无论组合方法多么简单或困难,但组合 dataframe 应该简单还是不复杂。
Here I will show you my calculate method and the only combination method I knew.在这里,我将向您展示我的计算方法和我所知道的唯一组合方法。 I nead the other new combination method.
我需要另一种新的组合方法。 Plz.
请。 My sample data and code below:
我的示例数据和代码如下:
data<-data.frame(matrix(sample(1:1000,500),20,25))
names(data) <- c(paste0("Gene_", 1:25))
rownames(data)<-NULL
data$Name<-c(rep(paste0("Group_",1:10),each=2))
unique(data$Name)
## 1 group_by, only get one result each time
mm <- data %>%
group_by(data$Name) %>%
summarise(mean=mean(Gene_1))
mm
## 2 tapply, can get the mean of each column , but only one column each time.
mm <- data.frame(mean_Gene_1=tapply(data[,"Gene_1"],data$Name,mean))
mm
## 3.aggregate, a powerful function , can get all the columns result by group.
mm <- aggregate(.~Name,data,mean)
mm
## get the mean and sd dataframe.
mean <- aggregate(.~Name,data,mean)
sd <- aggregate(.~Name,data,sd)
## now combine the two dataframe usingt the same index "Name" and "gene"
## I just learned one method from somebody in Stack overflow.
## combine the two file
data <- bind_rows(list(mean = mean, sd = sd), .id = "stat")
data_mean_sd <- data %>%
pivot_longer(-c(Name, stat), names_to = "Gene", values_to = "value") %>%
pivot_wider(names_from = "stat", values_from = "value")
You know the result is right.你知道结果是对的。 But it's a large file though it's a example here.
但它是一个大文件,虽然它是这里的一个例子。 It includes many duplicated data.
它包括许多重复的数据。 I hope somebody give me a better method to simplify my result.
我希望有人给我一个更好的方法来简化我的结果。
I need your help.我需要你的帮助。
I am not sure, would the approach below work for you?我不确定,下面的方法对你有用吗? The last part is basically the same using
pivot_longer
and pivot_wider
, but for the summarise part I used dplyr::across
.最后一部分使用
pivot_longer
和pivot_wider
基本相同,但对于总结部分,我使用dplyr::across
。
library(dplyr)
library(tidyr)
data<-data.frame(matrix(sample(1:1000,500),20,25))
names(data) <- c(paste0("Gene_", 1:25))
rownames(data)<-NULL
data$Name<-c(rep(paste0("Group_",1:10),each=2))
data %>%
group_by(Name) %>%
summarise(across(everything(),
list(mean = ~ mean(.x),
sd = ~ sd(.x)),
.names = "{col}__{fn}")) %>%
pivot_longer(-c(Name), names_to = "Gene", values_to = "value") %>%
separate(., Gene, into = c("Gene", "Stats"), sep = "__") %>%
pivot_wider(names_from = Stats, values_from = "value")
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 250 x 4
#> Name Gene mean sd
#> <chr> <chr> <dbl> <dbl>
#> 1 Group_1 Gene_1 534. 556.
#> 2 Group_1 Gene_2 294. 51.6
#> 3 Group_1 Gene_3 262. 350.
#> 4 Group_1 Gene_4 615 338.
#> 5 Group_1 Gene_5 89 43.8
#> 6 Group_1 Gene_6 322 263.
#> 7 Group_1 Gene_7 696. 391.
#> 8 Group_1 Gene_8 182. 101.
#> 9 Group_1 Gene_9 582 139.
#> 10 Group_1 Gene_10 184 2.83
#> # ... with 240 more rows
Created on 2021-01-27 by the reprex package (v0.3.0)由代表 package (v0.3.0) 于 2021 年 1 月 27 日创建
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.