簡體   English   中英

將函數應用於數據框所有列的每一列的因子

[英]apply a function to factors of each column for all columns of a data frame

我有一個包含 6 列的數據框。 前 4 列每列包含 2 個因子。 我想編寫一個函數(或 for 循環)來在pc1pc2列的值的每列的因子之間執行測試(例如wilcox.test )。

如果我要手動完成:

wilcox.test(df[df$g1=="bm",5],df[df$g1!="bm",5])
wilcox.test(df[df$g1=="bm",6],df[df$g1!="bm",6])

我如何獲得存儲在數據框中的每個測試的p.values ,其中rows等於df前 4 列, columns等於pc1pc2

我試過這個,但它不正確:

mutate_if(df[,head(colnames(df),-2)], is.character, as.factor) %>% #check whether 4 first columns are as factor
  lapply(.,
  function(x) {
    df = data.frame(row.names = head(colnames(df),-2))
         names(df) = c("pc1", "pc2")
         df$pc1 = wilcox.test(df[df$g1=="bm",5],df[df$g1!="bm",5])
         df$pc2 = wilcox.test(df[df$g1=="bm",6],df[df$g1!="bm",6])
         return(df)
       }
)

我的數據框

> dput(df)
structure(list(g1 = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 
1L, 1L), .Label = c("bm", "cm"), class = "factor"), g2 = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L), .Label = c("ct", "ft"), class = "factor"), 
    g3 = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L), .Label = c("bn", 
    "un"), class = "factor"), g4 = structure(c(2L, 2L, 1L, 1L, 
    1L, 1L, 1L, 2L, 2L, 2L), .Label = c("ls", "vp"), class = "factor"), 
    pc1 = c(0.86, 0.54, 0.06, 0.88, 0.62, 0.14, 0.94, 0.8, 0.34, 
    0.04), pc2 = c(0.04, 0.9, 0.68, 0.54, 0.92, 0.36, 0.3, 0.62, 
    0.84, 0.96)), class = "data.frame", row.names = c(NA, -10L
))

下面可能會給你一些關於如何解決這個問題的想法:

(我沒有將其推廣到所有測試,因為我不確定是否所有測試都將p.value存儲在同一位置。)

library(dplyr)
library(tidyr)

lapply(which(sapply(df, is.factor)),
       function(i) df[, c(i, 5, 6)] %>%

         # set column names & extract group values into a separate label
         # so that the subsequent code can be used for all four columns
         # (the label's wording can be changed as desired)
         setNames(c("group", "pc1", "pc2")) %>%
         filter(!is.na(group)) %>% # filter out NA rows
         mutate(label = paste0("Column ", i, ": ",
                               paste0(unique(as.character(group)),
                                      collapse = " vs "))) %>%
         mutate(group = paste0("group", as.integer(group))) %>%

         # pivot data such that each group of pc1 / pc2 values is in its own column
         group_by(group) %>% 
         mutate(id = seq(1, n())) %>% 
         pivot_wider(id_cols = c(label, id), 
                     names_from = group, 
                     values_from = c(pc1, pc2)) %>%

         # perform separate tests on pc1 & pc2, and extract p-value in each case
         summarise(label = unique(label),
                   pc1 = wilcox.test(pc1_group1, pc1_group2)$p.value,
                   pc2 = wilcox.test(pc2_group1, pc2_group2)$p.value)) %>%

  # combine results from each group
  data.table::rbindlist()

# result:
                label       pc1       pc2
1: Column 1: bm vs cm 1.0000000 1.0000000
2: Column 2: ct vs ft 0.6904762 0.8412698
3: Column 3: un vs bn 0.8412698 1.0000000
4: Column 4: vp vs ls 0.6904762 0.5476190

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM