简体   繁体   English

如何在R中循环使用此高阶函数?

[英]How can you loop this higher-order function in R?

This question relates to the reply I received here with a nice little function from thelatemail . 这个问题与我在这里收到的回覆有关,该回覆具有来自thelatemail的良好功能。 The dataframe I'm using is not optimal, but it's what I've got and I'm simply trying to loop this function across all rows. 我使用的数据框不是最佳的,但这是我所拥有的,而我只是想在所有行上循环使用此函数。

This is my df 这是我的df

dput(SO_Example_v1)
structure(list(Type = structure(c(3L, 1L, 2L), .Label = c("Community", 
"Contaminant", "Healthcare"), class = "factor"), hosp1_WoundAssocType = c(464L, 
285L, 24L), hosp1_BloodAssocType = c(73L, 40L, 26L), hosp1_UrineAssocType = c(75L, 
37L, 18L), hosp1_RespAssocType = c(137L, 77L, 2L), hosp1_CathAssocType = c(80L, 
34L, 24L), hosp2_WoundAssocType = c(171L, 115L, 17L), hosp2_BloodAssocType = c(127L, 
62L, 12L), hosp2_UrineAssocType = c(50L, 29L, 6L), hosp2_RespAssocType = c(135L, 
142L, 6L), hosp2_CathAssocType = c(95L, 24L, 12L)), .Names = c("Type", 
"hosp1_WoundAssocType", "hosp1_BloodAssocType", "hosp1_UrineAssocType", 
"hosp1_RespAssocType", "hosp1_CathAssocType", "hosp2_WoundAssocType", 
"hosp2_BloodAssocType", "hosp2_UrineAssocType", "hosp2_RespAssocType", 
"hosp2_CathAssocType"), class = "data.frame", row.names = c(NA, 
-3L))
####################
#what it looks like#
####################
require(dplyr)
df <- tbl_df(SO_Example_v1)
head(df)
         Type hosp1_WoundAssocType hosp1_BloodAssocType hosp1_UrineAssocType
1  Healthcare                  464                   73                   75
2   Community                  285                   40                   37
3 Contaminant                   24                   26                   18
Variables not shown: hosp1_RespAssocType (int), hosp1_CathAssocType (int), hosp2_WoundAssocType
  (int), hosp2_BloodAssocType (int), hosp2_UrineAssocType (int), hosp2_RespAssocType (int),
  hosp2_CathAssocType (int)

The function I have is to perform a chisq.test across all categories in df$Type . 我的功能是对df$Type所有类别执行chisq.test Ideally the function should switch to a fisher.test() if the cell count is <5, but that's a separate issue (extra brownie points for the person who comes up with how to do that though). 理想情况下,如果单元格计数小于5,则函数应切换到fisher.test() ,但这是一个单独的问题(尽管提出该方法的人需要额外的布朗尼)。

This is the function I'm using to go row by row 这是我用来逐行执行的功能

func <- Map(
  function(x,y) {
    out <- cbind(x,y)
    final <- rbind(out[1,],colSums(out[2:3,]))
    chisq <- chisq.test(final,correct=FALSE)
    chisq$p.value
  },
  SO_Example_v1[grepl("^hosp1",names(SO_Example_v1))],
  SO_Example_v1[grepl("^hosp2",names(SO_Example_v1))] 
)
func

But ideally, i'd want it to be something like this 但理想情况下,我希望它像这样

for(i in 1:nrow(df)){func}

But that doesn't work. 但这是行不通的。 A further hook is, that when for example, row two is taken, the final call looks like this 另一个钩子是,例如当采用第二行时, final调用看起来像这样

func <- Map(
  function(x,y) {
    out <- cbind(x,y)
    final <- rbind(out[2,],colSums(out[c(1,3),]))
    chisq <- chisq.test(final,correct=FALSE)
    chisq$p.value
  },
  SO_Example_v1[grepl("^hosp1",names(SO_Example_v1))],
  SO_Example_v1[grepl("^hosp2",names(SO_Example_v1))] 
)
func

so the function should understand that the cell count its taking for out[x,] has to be excluded from colSums() . 因此该函数应该理解,必须将其对out[x,]的单元格计数从colSums()排除。 This data.frame only has 3 rows, so it's easy, but I've tried applying this function to a separate data.frame I have that consists >200 rows, so it would be nice to be able to loop this somehow. 这个data.frame只有3行,所以很容易,但是我尝试将这个函数应用于一个单独的data.frame,它包含200行以上,因此能够以某种方式循环会很好。

Any help appreciated. 任何帮助表示赞赏。

Cheers 干杯

You were missing two things: 您错过了两件事:

  1. To select the line i and select all but this line you want to use u[i] and u[-i] 要选择第i行并选择除该行以外的所有行,您要使用u[i]u[-i]
  2. If an item is not the same length than the others given to Map, it is recycled, a very general property of the language. 如果某项的长度与给Map的其他项的长度不同,则会对其进行回收,这是该语言的一种非常普遍的属性。 You then just have to add an argument to the function that corresponds to the line you want to oppose to the others, it will be recycled for all the items of the vectors passed. 然后,您只需要在函数中添加一个参数,该参数与您要与其他行相对的行相对应,就可以为传递的矢量的所有项目循环使用该参数。

The following does what you asked for 以下是您所要求的

    # the function doing the stats
    FisherOrChisq <- function(x,y,lineComp) {
        out <- cbind(x,y)
        final <- rbind(out[lineComp,],colSums(out[-lineComp,]))
        test <- chisq.test(final,correct=FALSE)

        return(test$p.value)
    }

    # test of the stat function
    FisherOrChisq(SO_Example_v1[grep("^hosp1",names(SO_Example_v1))[1]],
    SO_Example_v1[grep("^hosp2",names(SO_Example_v1))[1]],2)

    # making the loop
    result <- c()
    for(type in SO_Example_v1$Type){
        line <- which(SO_Example_v1$Type==type)
        res <- Map(FisherOrChisq,
                    SO_Example_v1[grepl("^hosp1",names(SO_Example_v1))],
                    SO_Example_v1[grepl("^hosp2",names(SO_Example_v1))],
                    line
                )
        result <- rbind(result,res)
    }
    colnames(result) <- gsub("^hosp[0-9]+","",colnames(result))
    rownames(result) <- SO_Example_v1$Type

That said, what you are doing is very heavy multiple testing. 也就是说,您要做的是非常繁重的多次测试。 I would be extremely cautious with the use of the corresponding p-values, you need at least to use a multiple testing correction such as what is suggested here . 对于相应的p值,我会非常谨慎,您至少需要使用多次测试校正,例如此处建议的校正。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM