[英]Apply function to each column in a data frame observing each columns existing data type
[英]apply a function to factors of each column for all columns of a data frame
我有一個包含 6 列的數據框。 前 4 列每列包含 2 個因子。 我想編寫一個函數(或 for 循環)來在pc1
和pc2
列的值的每列的因子之間執行測試(例如wilcox.test
)。
如果我要手動完成:
wilcox.test(df[df$g1=="bm",5],df[df$g1!="bm",5])
wilcox.test(df[df$g1=="bm",6],df[df$g1!="bm",6])
我如何獲得存儲在數據框中的每個測試的p.values
,其中rows
等於df
前 4 列, columns
等於pc1
和pc2
。
我試過這個,但它不正確:
mutate_if(df[,head(colnames(df),-2)], is.character, as.factor) %>% #check whether 4 first columns are as factor
lapply(.,
function(x) {
df = data.frame(row.names = head(colnames(df),-2))
names(df) = c("pc1", "pc2")
df$pc1 = wilcox.test(df[df$g1=="bm",5],df[df$g1!="bm",5])
df$pc2 = wilcox.test(df[df$g1=="bm",6],df[df$g1!="bm",6])
return(df)
}
)
我的數據框
> dput(df)
structure(list(g1 = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L,
1L, 1L), .Label = c("bm", "cm"), class = "factor"), g2 = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L), .Label = c("ct", "ft"), class = "factor"),
g3 = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L), .Label = c("bn",
"un"), class = "factor"), g4 = structure(c(2L, 2L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L), .Label = c("ls", "vp"), class = "factor"),
pc1 = c(0.86, 0.54, 0.06, 0.88, 0.62, 0.14, 0.94, 0.8, 0.34,
0.04), pc2 = c(0.04, 0.9, 0.68, 0.54, 0.92, 0.36, 0.3, 0.62,
0.84, 0.96)), class = "data.frame", row.names = c(NA, -10L
))
下面可能會給你一些關於如何解決這個問題的想法:
(我沒有將其推廣到所有測試,因為我不確定是否所有測試都將p.value
存儲在同一位置。)
library(dplyr)
library(tidyr)
lapply(which(sapply(df, is.factor)),
function(i) df[, c(i, 5, 6)] %>%
# set column names & extract group values into a separate label
# so that the subsequent code can be used for all four columns
# (the label's wording can be changed as desired)
setNames(c("group", "pc1", "pc2")) %>%
filter(!is.na(group)) %>% # filter out NA rows
mutate(label = paste0("Column ", i, ": ",
paste0(unique(as.character(group)),
collapse = " vs "))) %>%
mutate(group = paste0("group", as.integer(group))) %>%
# pivot data such that each group of pc1 / pc2 values is in its own column
group_by(group) %>%
mutate(id = seq(1, n())) %>%
pivot_wider(id_cols = c(label, id),
names_from = group,
values_from = c(pc1, pc2)) %>%
# perform separate tests on pc1 & pc2, and extract p-value in each case
summarise(label = unique(label),
pc1 = wilcox.test(pc1_group1, pc1_group2)$p.value,
pc2 = wilcox.test(pc2_group1, pc2_group2)$p.value)) %>%
# combine results from each group
data.table::rbindlist()
# result:
label pc1 pc2
1: Column 1: bm vs cm 1.0000000 1.0000000
2: Column 2: ct vs ft 0.6904762 0.8412698
3: Column 3: un vs bn 0.8412698 1.0000000
4: Column 4: vp vs ls 0.6904762 0.5476190
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.