[英]How to apply a function to grouped rows between 2 dataframes?
我有 2 個 g.netic 數據的數據幀,我希望在我的 2 個數據集中的所有表型之間運行超幾何測試 function(使用GeneOverlap
package 作為測試函數)。 我正在嘗試自動執行此過程並將每個表型的結果存儲在一個新的數據框中,但我堅持對兩個數據框中的所有表型自動執行 function。
我的數據集如下所示:
數據集 1:
Gene Gene_count Phenotype
Gene1 5 Phenotype1
Gene1 5 Phenotype2
Gene2 3 Phenotype1
Gene3 16 Phenotype6
Gene3. 16 Phenotype2
Gene3 16 Phenotype1
數據集2:
Gene Gene_count Phenotype
Gene1 10 Phenotype1
Gene1 10 Phenotype2
Gene4 4 Phenotype1
Gene2 17 Phenotype6
Gene6 3 Phenotype2
Gene7 2 Phenotype1
目前我一次運行一個超幾何測試,看起來像這樣:
dataset1_pheno1 <- dataset1 %>%
filter(str_detect(Phenotype, 'Phenotype1'))
dataset2_pheno1 <- dataset2 %>%
filter(str_detect(Phenotype, 'Phenotype1'))
go.obj <- newGeneOverlap(dataset1_pheno1$Gene,
dataset2_pheno1$Gene,
genome.size=1871)
go.obj <- testGeneOverlap(go.obj)
go.obj
我想為 2 個數據集中的每個表型重復這個 function,到目前為止,我一直在嘗試在 Dplyr 中使用 group_by() function,然后嘗試在其中運行 Geneoverlap function,但我一直無法獲得這個工作。 我可以使用哪些函數按 2 個數據集中的列和行進行分組,然后一次運行一組函數?
輸入數據示例:
library(GeneOverlap)
library(dplyr)
library(stringr)
dataset1 <- structure(list(Gene = c("Gene1", "Gene1", "Gene2", "Gene3", "Gene3.",
"Gene3"), Gene_count = c(5L, 5L, 3L, 16L, 16L, 16L), Phenotype = c("Phenotype1",
"Phenotype2", "Phenotype1", "Phenotype6", "Phenotype2", "Phenotype1"
)), row.names = c(NA, -6L), class = c("data.table", "data.frame"
))
dataset2 <- structure(list(Gene = c("Gene1", "Gene1", "Gene4", "Gene2", "Gene6",
"Gene7"), Gene_count = c(10L, 10L, 4L, 17L, 3L, 2L), Phenotype = c("Phenotype1",
"Phenotype2", "Phenotype1", "Phenotype6", "Phenotype2", "Phenotype1"
)), row.names = c(NA, -6L), class = c("data.table", "data.frame"
))
您可以按“表型”將每個數據集split
成列表,然后使用Map
對每個集運行測試。 但請注意,每個數據集必須以相同的順序具有相同數量的獨特表型。 換句話說, all(names(d1_split) == names(d2_split))
必須為真。
d1_split <- split(dataset1, dataset1$Phenotype)
d2_split <- split(dataset2, dataset2$Phenotype)
# this should be TRUE in order for Map to work correctly
all(names(d1_split) == names(d2_split))
tests <- Map(function(d1, d2) {
go.obj <- newGeneOverlap(d1$Gene, d2$Gene, genome.size = 1871)
return(testGeneOverlap(go.obj))
}, d1_split, d2_split)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.