简体   繁体   中英

R data.table creating a custom function using lapply to create and reassign multiple variables

I have the following lines of code:

DT[flag==T, temp:=haz_1.5]
DT[, temp:= na.locf(temp, na.rm = FALSE), "pid"]
DT[agedays==61, haz_1.5_1:=temp]

I need to convert this into a function, so that it will work on a list of variables, instead of just one single one. I have recently learned how to create a function using lapply by passing through a list of columns and conditions for the creation of one set of new columns. However I'm unsure of how to do it when I'm passing through a list of columns as well as carrying through all values of a variable forward on these columns.

For instance, I can code the following:

  columns<-c("haz_1.5", "waz_1.5")
  new_cols <- paste(columns, "1", sep = "_")
  x=61
  maled_anthro[(flag==TRUE)&(agedays==x), (new_cols) := lapply(.SD, function(y) na.locf(y,    na.rm=F)), .SDcols = columns] 

But I am missing the na.locf step and thus am not getting the same output as the original lines of code prior to building the function. How would I incorporate the line of code which utilizes na.locf to carry forward values (DT[, temp:= na.locf(temp, na.rm = FALSE), "pid"]) into this function in a way in which all the data is wrapped up into the single function? Would this work with lapply in the same manner?

Dummy data that's similar to the data table I'm using:

DT <- data.table(pid  = c(1,1,2,3,3,4,4,5,5,5),
                 flag = c(T,T,F,T,T,F,T,T,T,T),
                 agedays = c(1,61,61,51,61,23,61,1,32,61),
                 haz_1.5 = c(1,1,1,2,NA,1,3,2,3,4),
                 waz_1.5 = c(1,NA,NA,NA,NA,2,2,3,4,4))

OP's code can be turned into an anonymous function which is applied to the selected columns :

library(data.table)
columns <- c("haz_1.5", "waz_1.5")
new_cols <- paste0(columns, "_1")
x <-  61

DT[, (new_cols) := lapply(.SD, function(v) {
  temp <- fifelse(flag, v, NA_real_)
  temp <- nafill(temp, "locf")
  fifelse(agedays == x, temp, NA_real_)
}), .SDcols = columns, by = pid][]
 pid flag agedays haz_1.5 waz_1.5 haz_1.5_1 waz_1.5_1 1: 1 TRUE 1 1 1 NA NA 2: 1 TRUE 61 1 NA 1 1 3: 2 FALSE 61 1 NA NA NA 4: 3 TRUE 51 2 NA NA NA 5: 3 TRUE 61 NA NA 2 NA 6: 4 FALSE 23 1 2 NA NA 7: 4 TRUE 61 3 2 3 2 8: 5 TRUE 1 2 3 NA NA 9: 5 TRUE 32 3 4 NA NA 10: 5 TRUE 61 4 4 4 4

This is the same result we would get when we manually repeat OP's code for the two columns (note that it is required to clear the temp column before assigning by reference parts of it.)

DT[(flag), temp := haz_1.5]
DT[, temp := zoo::na.locf(temp, na.rm = FALSE), by = pid]
DT[agedays == 61, haz_1.5_1 := temp]
DT[, temp := NULL]
DT[(flag), temp := waz_1.5]
DT[, temp := zoo::na.locf(temp, na.rm = FALSE), by = pid]
DT[agedays == 61, waz_1.5_1 := temp]
DT[, temp := NULL][]
 pid flag agedays haz_1.5 waz_1.5 haz_1.5_1 waz_1.5_1 1: 1 TRUE 1 1 1 NA NA 2: 1 TRUE 61 1 NA 1 1 3: 2 FALSE 61 1 NA NA NA 4: 3 TRUE 51 2 NA NA NA 5: 3 TRUE 61 NA NA 2 NA 6: 4 FALSE 23 1 2 NA NA 7: 4 TRUE 61 3 2 3 2 8: 5 TRUE 1 2 3 NA NA 9: 5 TRUE 32 3 4 NA NA 10: 5 TRUE 61 4 4 4 4

Some explanations

  • There is one important difference between OP's "single column" code and this approach: The anonymous function is called for each item in the grouping variable pid . In OP's code, the first and last assignments are working on the ungrouped (full) vectors (which might be somewhat more efficient, perhaps). However, the result of those assignments is independent of pid and the result is the same.
  • Instead of zoo::na.locf() , data.table's nafill() function is used (new with data.table v1.12.4, on CRAN 03 Oct 2019)
  • DT[(flag), ...] is equivalent to DT[flag == TRUE, ...]
  • When fifelse() is used instead of subsetted assign by reference , the no parameter must be NA to be compliant. Thus, DT[, temp:= fifelse(flag, haz_1.5, NA_real_)][] is equivalent to DT[(flag), temp:= haz_1.5][]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM