简体   繁体   中英

R: Looping through multiple columns and using all columns in a function?

In R: How do I loop through multiple columns and use a custom made function that takes in an argument from each of those columns and modifies those columns accordingly?

For example I have the following dataframe:

> head(runTimeSep)
  hours   h minutes  min
1    70 min      NA <NA>
2    21 min      NA <NA>
3   106 min      NA <NA>
4    75 min      NA <NA>
5    14 min      NA <NA>
6    82 min      NA <NA>
7     1 h        11 min

my goal is to obtain a list of total minutes in the hours column. If "1h" is listed in the hours and h column, then convert hours to minutes and add on the minutes from the minutes column (or add nothing is it's a perfect hour with NA in the minutes column).

Therefore I have created the following function to apply to the dataframe:

# convert hours to minutes function
hoursToMins = function(hours, h, minutes, min) {
  if (h == 'h' && min == "min") {
    (hours = as.numeric(hours)*60+as.numeric(minutes))
  }
  if (h=="h" && min != "min") {
    (hours = as.numeric(hours)*60)
  }
}

How do I apply this function across all columns in the data frame? Eg. with lapply, ddpply, etc.

Edit: I also attempted the following:

finalRunTime = ifelse(runTimeSep$h == "h", runTimeSep$hours*60,               runTimeSep$hours)
head(finalRunTime)  
runTimeSep$hours = finalRunTime

which worked fine. But when I tried to apply the second round of ifelse:

finalRunTime = ifelse(runTimeSep$min == "min", runTimeSep$hours +  runTimeSep$minutes, runTimeSep$hours)
head(finalRunTime)
runTimeSep$hours = finalRunTime

the 2nd round causes the else case (if there's no minute column) to become NA. Please help. Thanks.

In response to @Sandipan's answer: How do I use which to discriminate whether the min column is 'min' or NA?
I tried:

indices <- which(runTimeSep$h == 'h' && runTimeSep$min != 'min')
runTimeSep[indices,]$hours <- 60*runTimeSep[indices, ]$hours

indices <- which(runTimeSep$h == 'h' && runTimeSep$min == 'min')
runTimeSep[indices,]$hours <- 60*runTimeSep[indices, ]$hours + 
  runTimeSep[indices,]$minutes

However both sets of indices returned empty sets.

This would give you a vector of minutes by row and if you wanted its total, then just wrap sum() around it:

with( dat,   (h=="h")*60*hours + (h=="min")*hours + 
                                             ifelse( is.na(minutes), 0, minutes) )

[1]  70  21 106  75  14  82  71

It substitutes 0 for NA when minutes is NA. When a new column with those values is desired you can do this:

 dat$newmins <- with( dat, (h=="h")*60*hours + (h=="min")*hours + 
                                             ifelse( is.na(minutes), 0, minutes) )

you want something like this:

indices <- which(runTimeSep$h == 'h')
runTimeSep[indices,]$hours <- 60*runTimeSep[indices, ]$hours + 
                              runTimeSep[indices,]$minutes

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM