简体   繁体   English

通过R中的行/列嵌套循环

[英]nested looping through rows/columns in r

I have a data set that looks like this: 我有一个数据集,看起来像这样:

1. ID   RESULT_DATE Hyperkalemia    Is.Hemolyzed
2. 1    5/27/2008   2   FALSE
3. 1    5/28/2008   2   FALSE
4. 1    5/29/2008   2   FALSE
5. 1    5/29/2008   2   FALSE
6. 1    5/29/2008   3   FALSE
7. 1    5/30/2008   2   FALSE
8. 1    6/15/2008   4   FALSE
9. 1    10/14/2014  1   FALSE
10. 1   10/16/2014  NA  FALSE
11. 2   8/12/2013   2   FALSE
12. 3   2/26/2012   2   FALSE
13. 3   2/27/2012   2   FALSE
14. 3   4/18/2012   3   FALSE
15. 3   4/18/2012   4   FALSE
16. 3   4/21/2012   4   FALSE
17. 3   4/23/2012   4   FALSE
18. 3   4/27/2012   4   FALSE
19. 3   5/8/2012    4   FALSE
20. 3   5/12/2012   4   FALSE
21. 3   5/15/2012   4   FALSE
22. 3   5/15/2012   NA  FALSE

I want find the number of times a potassium test with a Hyperkalemia score of 3 or 4 and an is.HEmolyzed = FALSE was repeated the same day (must count the repeats by patient ID) Objective is the total number of times the test qualified for a repeat and then the total number of times the repeat occurred. 我想查找高钾血症评分为3或4且钾盐检测为的钾检测的次数.HEmolyzed = FALSE在同一天重复(必须通过患者ID计数重复次数)目的是该检测合格的总次数重复,然后重复发生的总次数。

Could someone help me translate my pseudocode into working R code? 有人可以帮我将我的伪代码转换为有效的R代码吗?

    # data.frame = pots
    # for every row  (sorted by patient and result date) 
   for (i in 1:nrow(pots){  

      # for each patient (sorted by result date)
        # how do I do I count the rows for the individual patient?
        for (i in 1:length(pots$ID)) {

          # assign result date to use for calculation
          result_date = pots$result_date

          # if Hyperkalemia = 3 or 4
          if (Hyperkalemia == 3 | Hyperkalemia == 4)

            # go find the next result for patient where is.Hemolyzed = FALSE
            # how do I get the next result?
            for (i+1)

             # assign date to compare to first date
              next_result_date = pots$result_date
              if next_result_date > result_date 
                    then repeated_same_day <- FALSE 
                else if result_date == result_date
                      then repeated_same_day <- TRUE

        }
    }

goal: I want to calculate how often (by unique ID) a grade 3 or 4 non-hemolyzed potassium result has another potassium test within 24 hours (I'm using a different field now -- I guess I can add some date function to calculate 24 hours). 目标:我想计算一次3或4级非溶血钾结果在24小时内通过另一个钾测试(通过唯一ID)的频率(我现在使用的是其他字段-我想我可以在其中添加一些日期函数计算24小时)。

Edit: I did get it working with for loops eventually!! 编辑:我确实让它最终与for循环一起使用!! Sharing in case it is helpful to anyone. 分享以防万一。 Later I did see a mistake, but for my data set it was okay. 后来我确实看到了一个错误,但是对于我的数据集来说还可以。

library(dplyr)
pots <- read.csv("phis_potassium-2015-07-30.csv",  
                      head=TRUE,  stringsAsFactors = FALSE)

pots <- arrange(pots, MRN, COLLECTED_DATE)

pots$Hyperkalemia[is.na(pots$Hyperkalemia)] <- 0
pots$repeated_wi24hours <- NA
pots$met_criteria <- NA
pots$next_test_time_interval <- NA

# data.frame = pots
# for every patient  (sorted by patient and collected date) 
for (mrn in unique(pots$MRN)){ 
   # for each row for each patient (sorted by collected date)
     for (i in 1:length(pots$MRN[pots$MRN == pots$MRN[mrn]])) {
          # if Hyperkalemia = 3 or 4 AND Is.Hemolyzed == FALSE
         if((pots$Hyperkalemia[i] == 3 | pots$Hyperkalemia[i] == 4) & pots$Is.Hemolyzed[i] == FALSE){
           pots$met_criteria[i] <- TRUE
           # get time interval between tests
           pots$next_test_time_interval[i] <- difftime(pots$COLLECTED_DATE[i+1], pots$COLLECTED_DATE[i], units = "hours")
           # if next date is within 24 hours, then test repeated 
           if (pots$next_test_time_interval[i] <= 24 ){
                                pots$repeated_wi24hours[i] <- TRUE

               }
                else {
                  pots$repeated_wi24hours[i] <- FALSE
                }
}
}
}

Desired output 所需的输出

ID  RESULT_DATE Hyperkalemia    Is.Hemolyzed    Met_criteria    Repeated
1   5/27/2008              2           FALSE        
1   5/28/2008              2           FALSE        
1   5/29/2008              2           FALSE        
1   5/29/2008              2           FALSE        
1   5/29/2008              3           FALSE    TRUE               FALSE

1   5/30/2008              2           FALSE        
1   6/15/2008              4           FALSE        
1   10/14/2014             1           FALSE        
2   8/12/2013              2           FALSE        
3   2/26/2012              2           FALSE        
3   2/27/2012              2           FALSE        
3   4/18/2012              3           FALSE    TRUE               TRUE

3   4/18/2012              4           FALSE    TRUE               FALSE

3   4/21/2012              4           FALSE    TRUE               FALSE

How about this: 这个怎么样:

metCriteria <- function( dfPots )
{
  (dfPots$Hyperkalemia==3 | dfPots$Hyperkalemia==4) & !dfPots$Is.Hemolyzed
}

#----------------------------------------------------------------------

pots <- read.table(filename, header=TRUE)

d <- paste( as.character(pots$RESULT_DATE),
            "_ID",
            as.character(pots$ID))

lastOccurence <- unlist(lapply(d,function(x){which.min(diff(c(d,FALSE)==x))}))

pots <- cbind(pots, data.frame( Met_criteria = rep(FALSE,nrow(pots))),
                                Repeated     = rep(TRUE ,nrow(pots))   )

pots$Repeated[lastOccurence]                <- FALSE
pots$Met_criteria[which(metCriteria(pots))] <- TRUE

The dates and ID's are pasted together in the vector "d". 将日期和ID一起粘贴到向量“ d”中。 The i-th component of the vector "lastOccurence" is the row number where the date/ID-pair d[i] occures or the last time. 向量“ lastOccurence”的第i个分量是发生日期/ ID对d [i]或最后一次的行号。

The data frame "pots" is extended by two columns, "Met_criteria" and "Repeated". 数据帧“ pots”扩展了两列,“ Met_criteria”和“重复”。

  • "Met_criteria" is initialized "FALSE". “ Met_criteria”被初始化为“ FALSE”。 Then "which(metCriteria(pots))" picks the row numbers where the criteria are met. 然后,“ which(metCriteria(pots))”选择满足条件的行号。 In these rows "Met_critaria" is set to "TRUE". 在这些行中,“ Met_critaria”设置为“ TRUE”。
  • "Repeated" is initialized "TRUE". “重复”被初始化为“真”。 It is set to "FALSE" in those rows where the corresponding date and ID occures for the last time. 在最后一次出现相应日期和ID的那些行中,将其设置为“ FALSE”。

Example: 例:

> pots
   ID RESULT_DATE Hyperkalemia Is.Hemolyzed Met_criteria Repeated
1   1   5/27/2008            2        FALSE        FALSE    FALSE
2   1   5/28/2008            2        FALSE        FALSE    FALSE
3   3   5/28/2008            2        FALSE        FALSE    FALSE
4   1   5/29/2008            2        FALSE        FALSE     TRUE
5   1   5/29/2008            2        FALSE        FALSE     TRUE
6   1   5/29/2008            3        FALSE         TRUE    FALSE
7   2   5/29/2008            4        FALSE         TRUE    FALSE
8   1   5/30/2008            2        FALSE        FALSE    FALSE
9   1   6/15/2008            4        FALSE         TRUE    FALSE
10  1  10/14/2014            1        FALSE        FALSE    FALSE
11  1  10/16/2014           NA        FALSE        FALSE    FALSE
12  2   8/12/2013            2        FALSE        FALSE    FALSE
13  3   2/26/2012            2        FALSE        FALSE    FALSE
14  3   2/27/2012            2        FALSE        FALSE    FALSE
15  3   4/18/2012            3        FALSE         TRUE     TRUE
16  3   4/18/2012            4        FALSE         TRUE    FALSE
17  3   4/21/2012            4        FALSE         TRUE    FALSE
18  3   4/23/2012            4        FALSE         TRUE    FALSE
19  3   4/27/2012            4        FALSE         TRUE    FALSE
20  3    5/8/2012            4        FALSE         TRUE    FALSE
21  3   5/12/2012            4        FALSE         TRUE    FALSE
22  3   5/15/2012            4        FALSE         TRUE     TRUE
23  3   5/15/2012           NA        FALSE        FALSE    FALSE
> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM