如何將 function 應用於 2 個數據幀之間的分組行？

Question

我有 2 個 g.netic 數據的數據幀，我希望在我的 2 個數據集中的所有表型之間運行超幾何測試 function（使用GeneOverlap package 作為測試函數）。 我正在嘗試自動執行此過程並將每個表型的結果存儲在一個新的數據框中，但我堅持對兩個數據框中的所有表型自動執行 function。

我的數據集如下所示：

數據集 1：

Gene      Gene_count   Phenotype
Gene1          5       Phenotype1
Gene1          5       Phenotype2
Gene2          3       Phenotype1
Gene3         16       Phenotype6
Gene3.        16       Phenotype2
Gene3         16       Phenotype1

數據集2：

Gene    Gene_count     Phenotype
Gene1         10       Phenotype1
Gene1         10       Phenotype2
Gene4         4        Phenotype1
Gene2         17       Phenotype6
Gene6         3        Phenotype2
Gene7         2        Phenotype1

目前我一次運行一個超幾何測試，看起來像這樣：

dataset1_pheno1 <- dataset1  %>%
  filter(str_detect(Phenotype, 'Phenotype1'))

dataset2_pheno1 <- dataset2  %>%
  filter(str_detect(Phenotype, 'Phenotype1'))

go.obj <- newGeneOverlap(dataset1_pheno1$Gene, 
                         dataset2_pheno1$Gene,
                         genome.size=1871)
go.obj <- testGeneOverlap(go.obj)
go.obj

我想為 2 個數據集中的每個表型重復這個 function，到目前為止，我一直在嘗試在 Dplyr 中使用 group_by() function，然后嘗試在其中運行 Geneoverlap function，但我一直無法獲得這個工作。 我可以使用哪些函數按 2 個數據集中的列和行進行分組，然后一次運行一組函數？

輸入數據示例：

library(GeneOverlap)
library(dplyr)
library(stringr)

dataset1 <- structure(list(Gene = c("Gene1", "Gene1", "Gene2", "Gene3", "Gene3.", 
"Gene3"), Gene_count = c(5L, 5L, 3L, 16L, 16L, 16L), Phenotype = c("Phenotype1", 
"Phenotype2", "Phenotype1", "Phenotype6", "Phenotype2", "Phenotype1"
)), row.names = c(NA, -6L), class = c("data.table", "data.frame"
))


dataset2 <- structure(list(Gene = c("Gene1", "Gene1", "Gene4", "Gene2", "Gene6", 
"Gene7"), Gene_count = c(10L, 10L, 4L, 17L, 3L, 2L), Phenotype = c("Phenotype1", 
"Phenotype2", "Phenotype1", "Phenotype6", "Phenotype2", "Phenotype1"
)), row.names = c(NA, -6L), class = c("data.table", "data.frame"
))

Answer 1

您可以按“表型”將每個數據集split成列表，然后使用Map對每個集運行測試。 但請注意，每個數據集必須以相同的順序具有相同數量的獨特表型。 換句話說， all(names(d1_split) == names(d2_split))必須為真。

d1_split <- split(dataset1, dataset1$Phenotype)
d2_split <- split(dataset2, dataset2$Phenotype)

# this should be TRUE in order for Map to work correctly
all(names(d1_split) == names(d2_split))

tests <- Map(function(d1, d2) {
  go.obj <- newGeneOverlap(d1$Gene, d2$Gene, genome.size = 1871)
  return(testGeneOverlap(go.obj))
}, d1_split, d2_split)

如何將 function 應用於 2 個數據幀之間的分組行？

問題描述

1 個解決方案

解決方案1
1 已采納 2022-03-06 15:25:18

如何將 function 應用於 2 個數據幀之間的分組行？

問題描述

1 個解決方案

解決方案1 1 已采納 2022-03-06 15:25:18

解決方案1
1 已采納 2022-03-06 15:25:18