如何按组计算 FPKM 基因计数的平均值和 sd 并将平均值和 sd 组合为 dataframe？

Question

Luckly, the first step of calculating the mean and sd by group has been finished.幸运的是，已经完成了按组计算mean和sd的第一步。 Now I have the mean and sd result respectively.现在我分别得到了mean和sd结果。 And what I wanna do is how to combind theme togather.而我想做的是如何将主题结合在一起。 No matter how easy or difficult the combination method but should the combination dataframe be simple or not complicated.无论组合方法多么简单或困难，但组合 dataframe 应该简单还是不复杂。

Here I will show you my calculate method and the only combination method I knew.在这里，我将向您展示我的计算方法和我所知道的唯一组合方法。 I nead the other new combination method.我需要另一种新的组合方法。 Plz.请。 My sample data and code below:我的示例数据和代码如下：

data<-data.frame(matrix(sample(1:1000,500),20,25))
names(data) <- c(paste0("Gene_", 1:25))
rownames(data)<-NULL
data$Name<-c(rep(paste0("Group_",1:10),each=2))
        
unique(data$Name)
## 1 group_by, only get one result each time
mm <- data %>% 
  group_by(data$Name) %>% 
  summarise(mean=mean(Gene_1))
mm

## 2 tapply, can get the mean of each column , but only one column each time.
mm <- data.frame(mean_Gene_1=tapply(data[,"Gene_1"],data$Name,mean))  
mm

## 3.aggregate, a powerful function , can get all the columns result by group.
mm <- aggregate(.~Name,data,mean) 
mm
        
## get the mean and sd dataframe.
mean <- aggregate(.~Name,data,mean) 
sd <- aggregate(.~Name,data,sd) 
        
## now combine the two dataframe usingt the same index "Name" and "gene"        
## I just learned one method from somebody in Stack overflow. 
## combine the two file 
data <- bind_rows(list(mean = mean, sd = sd), .id = "stat")
        
data_mean_sd <- data %>% 
  pivot_longer(-c(Name, stat), names_to = "Gene", values_to = "value") %>%
  pivot_wider(names_from = "stat", values_from = "value")

You know the result is right.你知道结果是对的。 But it's a large file though it's a example here.但它是一个大文件，虽然它是这里的一个例子。 It includes many duplicated data.它包括许多重复的数据。 I hope somebody give me a better method to simplify my result.我希望有人给我一个更好的方法来简化我的结果。

I need your help.我需要你的帮助。

Answer 1

I am not sure, would the approach below work for you?我不确定，下面的方法对你有用吗？ The last part is basically the same using pivot_longer and pivot_wider , but for the summarise part I used dplyr::across .最后一部分使用pivot_longer和pivot_wider基本相同，但对于总结部分，我使用dplyr::across 。

library(dplyr)
library(tidyr)

data<-data.frame(matrix(sample(1:1000,500),20,25))
names(data) <- c(paste0("Gene_", 1:25))
rownames(data)<-NULL
data$Name<-c(rep(paste0("Group_",1:10),each=2))


data %>% 
  group_by(Name) %>% 
  summarise(across(everything(),
                   list(mean = ~ mean(.x),
                        sd = ~ sd(.x)),
                   .names = "{col}__{fn}")) %>% 
  pivot_longer(-c(Name), names_to = "Gene", values_to = "value") %>% 
  separate(., Gene, into = c("Gene", "Stats"), sep = "__") %>% 
  pivot_wider(names_from = Stats, values_from = "value")

#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 250 x 4
#>    Name    Gene     mean     sd
#>    <chr>   <chr>   <dbl>  <dbl>
#>  1 Group_1 Gene_1   534. 556.  
#>  2 Group_1 Gene_2   294.  51.6 
#>  3 Group_1 Gene_3   262. 350.  
#>  4 Group_1 Gene_4   615  338.  
#>  5 Group_1 Gene_5    89   43.8 
#>  6 Group_1 Gene_6   322  263.  
#>  7 Group_1 Gene_7   696. 391.  
#>  8 Group_1 Gene_8   182. 101.  
#>  9 Group_1 Gene_9   582  139.  
#> 10 Group_1 Gene_10  184    2.83
#> # ... with 240 more rows

^{Created on 2021-01-27 by the reprex package (v0.3.0)}^{由代表 package (v0.3.0) 于 2021 年 1 月 27 日创建}

如何按组计算 FPKM 基因计数的平均值和 sd 并将平均值和 sd 组合为 dataframe？

问题描述

1 个解决方案

解决方案1
0 2021-01-27 11:12:22

如何按组计算 FPKM 基因计数的平均值和 sd 并将平均值和 sd 组合为 dataframe？

问题描述

1 个解决方案

解决方案1 0 2021-01-27 11:12:22

解决方案1
0 2021-01-27 11:12:22