简体   繁体   中英

How to replace mutiple nested for loops with apply family functions in R?

I have four main variables in my dataset (dat).

  1. SubjectID
  2. Group (can be Easy1, Easy2, Hard1, Hard2)
  3. Object (x, y, z, w)
  4. Reaction time

For each combination of variables 1, 2 and 3 I want to change the reaction time, so that all values above the 3rd Quartile + 1.5IQR are set to the value of 3rd Quartile + 1.5 IQR.

TUK <- function (a,b,c) {
....
}

Basically, the for loop logic would be:

for (i in dat$SubjectID):
for (j in dat$Group):
for (k in dat$Object) :
TUK(i,j,k)

How can I do this with apply function family?

Thank you!

Adding reproducible example:

SubjectID <- c(3772113,3772468)
Group <- c("Easy","Hard")
Object <- c("A","B")
dat <- data.frame(expand.grid(SubjectID,Group,Object))
dat$RT <- rnorm(8,1500,700)
colnames(dat) <- c("SubjectID","Group","Object","RT")

TUK <- function (SUBJ,GROUP,OBJECT){
  p <- dat[dat$SubjectID==SUBJ & dat$Group== GROUP & dat$Object==OBJECT, "RT"]

  p[p$RT< 1000 | p$RT> 2000,] <- NA

  dat[dat$SubjectID==SUBJ & dat$Group== GROUP & dat$Object==OBJECT, "RT"]<<- p
}

A big part of your problem is that your TUK function is terrible . Here are some reasons why

  • Problem: it depends on having a data frame named dat in the global environment . Change the name of your data and it breaks.

    • Solution: you should pass in all arguments needed. In this case, dat should be an argument.
  • Problem: Global assignment <<- should be avoided . There are certain advanced cases where it is necessary (eg, sometimes in Shiny apps), but in general it makes a function behave in very un-R-like ways.

    • Solution: Simply return() a value and assign it like any other normal R function.
  • Problem: It's over-complicated. You're by passing in SUBJ, GROUP, and OBJECT but only using them to subset you're trying to do inside your function the "grouping" bit that dplyr or data.table or base::ave excels at. It's as if you're trying to build you function in a way so that if could only possibly be used embedded in this particular for loop.

    • Solution: Functions should be simple building blocks. Make this a function of just a single vector. It will be much cleaner and easier to debug. When it works on a single vector, use dplyr or data.table or ave (or even a for loop) to do the split-apply-combining of it. This also makes your function more generally useful instead of being cemented to this one particular case.

With the above in mind, here's an attempted re-write:

TUK2 <- function (RT){
  RT[RT < 1000 | RT > 2000] <- NA
  return(RT)
}

See how much simpler! Now if we want to apply this function to each of the GROUP:SUBJ:OBJECT groupings in your data, and replace the RT column with the result, we do this with dplyr :

library(dplyr)
group_by(dat, Group, SubjectID, Object) %>%
    mutate(new_RT = TUK2(RT))

dplyr does the grouping of data, the splitting of data, applies the simple function to each piece, and combines it all back together for us.


Now, in your question, you said

For each combination of variables 1, 2 and 3 I want to change the reaction time, so that all values above the 3rd Quartile + 1.5IQR are set to the value of 3rd Quartile + 1.5 IQR.

This doesn't sound much like what your function does. Based only on this description, I would code this as

group_by(dat, Group, SubjectID, Object) %>%
    mutate(new_RT = pmin(RT, quantile(RT, probs = 0.75) + 1.5 * IQR(RT)))

pmin is for parallel minimum , it's a vectorized way to take the smaller of two vectors. Try, eg, pmin(1:10, 7) , to see what it does.

In both examples, the dplyr data frame won't be saved, of course, unless you re-assign it with dat <- group_by(dat, ...) etc. This is the functional programming way of doing things - no global assignment.


One additional note: with the re-written function you could still use loops instead of dplyr . I don't know why you would - surely the dplyr syntax is nicer - but I just want to illustrate that the small building-block function is generally useful, it's not "baking in" dplyr in the way that your original function was "baking in" a particular for loop.

for (sub %in% unique(dat$SubjectID)) {
  for (obj %in% unique(dat$Object)) {
    for (grp %in% unique(dat$Group)) {
      dat[dat$SubjectID == sub & 
            dat$Object == obj & 
            dat$Group == grp, "RT"] <-
        TUK2(
          dat[dat$SubjectID == sub & 
                dat$Object == obj & 
                dat$Group == grp, "RT"]
        )
    }
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM