简体   繁体   English

如何按组计算 FPKM 基因计数的平均值和 sd 并将平均值和 sd 组合为 dataframe?

[英]How calculate the mean and sd of FPKM gene counts by group and combind the mean and sd as dataframe?

Luckly, the first step of calculating the mean and sd by group has been finished.幸运的是,已经完成了按组计算meansd的第一步。 Now I have the mean and sd result respectively.现在我分别得到了meansd结果。 And what I wanna do is how to combind theme togather.而我想做的是如何将主题结合在一起。 No matter how easy or difficult the combination method but should the combination dataframe be simple or not complicated.无论组合方法多么简单或困难,但组合 dataframe 应该简单还是不复杂。

Here I will show you my calculate method and the only combination method I knew.在这里,我将向您展示我的计算方法和我所知道的唯一组合方法。 I nead the other new combination method.我需要另一种新的组合方法。 Plz.请。 My sample data and code below:我的示例数据和代码如下:

data<-data.frame(matrix(sample(1:1000,500),20,25))
names(data) <- c(paste0("Gene_", 1:25))
rownames(data)<-NULL
data$Name<-c(rep(paste0("Group_",1:10),each=2))
        
unique(data$Name)
## 1 group_by, only get one result each time
mm <- data %>% 
  group_by(data$Name) %>% 
  summarise(mean=mean(Gene_1))
mm

## 2 tapply, can get the mean of each column , but only one column each time.
mm <- data.frame(mean_Gene_1=tapply(data[,"Gene_1"],data$Name,mean))  
mm

## 3.aggregate, a powerful function , can get all the columns result by group.
mm <- aggregate(.~Name,data,mean) 
mm
        
## get the mean and sd dataframe.
mean <- aggregate(.~Name,data,mean) 
sd <- aggregate(.~Name,data,sd) 
        
## now combine the two dataframe usingt the same index "Name" and "gene"        
## I just learned one method from somebody in Stack overflow. 
## combine the two file 
data <- bind_rows(list(mean = mean, sd = sd), .id = "stat")
        
data_mean_sd <- data %>% 
  pivot_longer(-c(Name, stat), names_to = "Gene", values_to = "value") %>%
  pivot_wider(names_from = "stat", values_from = "value")

You know the result is right.你知道结果是对的。 But it's a large file though it's a example here.但它是一个大文件,虽然它是这里的一个例子。 It includes many duplicated data.它包括许多重复的数据。 I hope somebody give me a better method to simplify my result.我希望有人给我一个更好的方法来简化我的结果。

I need your help.我需要你的帮助。

I am not sure, would the approach below work for you?我不确定,下面的方法对你有用吗? The last part is basically the same using pivot_longer and pivot_wider , but for the summarise part I used dplyr::across .最后一部分使用pivot_longerpivot_wider基本相同,但对于总结部分,我使用dplyr::across

library(dplyr)
library(tidyr)

data<-data.frame(matrix(sample(1:1000,500),20,25))
names(data) <- c(paste0("Gene_", 1:25))
rownames(data)<-NULL
data$Name<-c(rep(paste0("Group_",1:10),each=2))


data %>% 
  group_by(Name) %>% 
  summarise(across(everything(),
                   list(mean = ~ mean(.x),
                        sd = ~ sd(.x)),
                   .names = "{col}__{fn}")) %>% 
  pivot_longer(-c(Name), names_to = "Gene", values_to = "value") %>% 
  separate(., Gene, into = c("Gene", "Stats"), sep = "__") %>% 
  pivot_wider(names_from = Stats, values_from = "value")

#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 250 x 4
#>    Name    Gene     mean     sd
#>    <chr>   <chr>   <dbl>  <dbl>
#>  1 Group_1 Gene_1   534. 556.  
#>  2 Group_1 Gene_2   294.  51.6 
#>  3 Group_1 Gene_3   262. 350.  
#>  4 Group_1 Gene_4   615  338.  
#>  5 Group_1 Gene_5    89   43.8 
#>  6 Group_1 Gene_6   322  263.  
#>  7 Group_1 Gene_7   696. 391.  
#>  8 Group_1 Gene_8   182. 101.  
#>  9 Group_1 Gene_9   582  139.  
#> 10 Group_1 Gene_10  184    2.83
#> # ... with 240 more rows

Created on 2021-01-27 by the reprex package (v0.3.0)代表 package (v0.3.0) 于 2021 年 1 月 27 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM