[英]How can you loop this higher-order function in R?
This question relates to the reply I received here with a nice little function from thelatemail . 这个问题与我在这里收到的回覆有关,该回覆具有来自thelatemail的良好功能。 The dataframe I'm using is not optimal, but it's what I've got and I'm simply trying to loop this function across all rows. 我使用的数据框不是最佳的,但这是我所拥有的,而我只是想在所有行上循环使用此函数。
This is my df 这是我的df
dput(SO_Example_v1)
structure(list(Type = structure(c(3L, 1L, 2L), .Label = c("Community",
"Contaminant", "Healthcare"), class = "factor"), hosp1_WoundAssocType = c(464L,
285L, 24L), hosp1_BloodAssocType = c(73L, 40L, 26L), hosp1_UrineAssocType = c(75L,
37L, 18L), hosp1_RespAssocType = c(137L, 77L, 2L), hosp1_CathAssocType = c(80L,
34L, 24L), hosp2_WoundAssocType = c(171L, 115L, 17L), hosp2_BloodAssocType = c(127L,
62L, 12L), hosp2_UrineAssocType = c(50L, 29L, 6L), hosp2_RespAssocType = c(135L,
142L, 6L), hosp2_CathAssocType = c(95L, 24L, 12L)), .Names = c("Type",
"hosp1_WoundAssocType", "hosp1_BloodAssocType", "hosp1_UrineAssocType",
"hosp1_RespAssocType", "hosp1_CathAssocType", "hosp2_WoundAssocType",
"hosp2_BloodAssocType", "hosp2_UrineAssocType", "hosp2_RespAssocType",
"hosp2_CathAssocType"), class = "data.frame", row.names = c(NA,
-3L))
####################
#what it looks like#
####################
require(dplyr)
df <- tbl_df(SO_Example_v1)
head(df)
Type hosp1_WoundAssocType hosp1_BloodAssocType hosp1_UrineAssocType
1 Healthcare 464 73 75
2 Community 285 40 37
3 Contaminant 24 26 18
Variables not shown: hosp1_RespAssocType (int), hosp1_CathAssocType (int), hosp2_WoundAssocType
(int), hosp2_BloodAssocType (int), hosp2_UrineAssocType (int), hosp2_RespAssocType (int),
hosp2_CathAssocType (int)
The function I have is to perform a chisq.test
across all categories in df$Type
. 我的功能是对df$Type
所有类别执行chisq.test
。 Ideally the function should switch to a fisher.test()
if the cell count is <5, but that's a separate issue (extra brownie points for the person who comes up with how to do that though). 理想情况下,如果单元格计数小于5,则函数应切换到fisher.test()
,但这是一个单独的问题(尽管提出该方法的人需要额外的布朗尼)。
This is the function I'm using to go row by row 这是我用来逐行执行的功能
func <- Map(
function(x,y) {
out <- cbind(x,y)
final <- rbind(out[1,],colSums(out[2:3,]))
chisq <- chisq.test(final,correct=FALSE)
chisq$p.value
},
SO_Example_v1[grepl("^hosp1",names(SO_Example_v1))],
SO_Example_v1[grepl("^hosp2",names(SO_Example_v1))]
)
func
But ideally, i'd want it to be something like this 但理想情况下,我希望它像这样
for(i in 1:nrow(df)){func}
But that doesn't work. 但这是行不通的。 A further hook is, that when for example, row two is taken, the final
call looks like this 另一个钩子是,例如当采用第二行时, final
调用看起来像这样
func <- Map(
function(x,y) {
out <- cbind(x,y)
final <- rbind(out[2,],colSums(out[c(1,3),]))
chisq <- chisq.test(final,correct=FALSE)
chisq$p.value
},
SO_Example_v1[grepl("^hosp1",names(SO_Example_v1))],
SO_Example_v1[grepl("^hosp2",names(SO_Example_v1))]
)
func
so the function should understand that the cell count its taking for out[x,]
has to be excluded from colSums()
. 因此该函数应该理解,必须将其对out[x,]
的单元格计数从colSums()
排除。 This data.frame
only has 3 rows, so it's easy, but I've tried applying this function to a separate data.frame I have that consists >200 rows, so it would be nice to be able to loop this somehow. 这个data.frame
只有3行,所以很容易,但是我尝试将这个函数应用于一个单独的data.frame,它包含200行以上,因此能够以某种方式循环会很好。
Any help appreciated. 任何帮助表示赞赏。
Cheers 干杯
You were missing two things: 您错过了两件事:
u[i]
and u[-i]
要选择第i行并选择除该行以外的所有行,您要使用u[i]
和u[-i]
The following does what you asked for 以下是您所要求的
# the function doing the stats
FisherOrChisq <- function(x,y,lineComp) {
out <- cbind(x,y)
final <- rbind(out[lineComp,],colSums(out[-lineComp,]))
test <- chisq.test(final,correct=FALSE)
return(test$p.value)
}
# test of the stat function
FisherOrChisq(SO_Example_v1[grep("^hosp1",names(SO_Example_v1))[1]],
SO_Example_v1[grep("^hosp2",names(SO_Example_v1))[1]],2)
# making the loop
result <- c()
for(type in SO_Example_v1$Type){
line <- which(SO_Example_v1$Type==type)
res <- Map(FisherOrChisq,
SO_Example_v1[grepl("^hosp1",names(SO_Example_v1))],
SO_Example_v1[grepl("^hosp2",names(SO_Example_v1))],
line
)
result <- rbind(result,res)
}
colnames(result) <- gsub("^hosp[0-9]+","",colnames(result))
rownames(result) <- SO_Example_v1$Type
That said, what you are doing is very heavy multiple testing. 也就是说,您要做的是非常繁重的多次测试。 I would be extremely cautious with the use of the corresponding p-values, you need at least to use a multiple testing correction such as what is suggested here . 对于相应的p值,我会非常谨慎,您至少需要使用多次测试校正,例如此处建议的校正。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.