Apply function to each row of data.frame and preserve column classes

Question

I wonder if there is a way to apply a function to each row of a data.frame such that the column classes are preserved? Let's look at an example to clarify what I mean:

test <- data.frame(startdate = as.Date(c("2010-03-07", "2013-09-13", "2011-11-12")),
                   enddate = as.Date(c("2010-03-23", "2013-12-01", "2012-01-05")),
                   nEvents = c(123, 456, 789))

Suppose I would like to expand the data.frame test by inserting all days between startdate and enddate and distribute the number of events over those days. My first try to do so was this:

eventsPerDay1 <- function(row) {
    n_days <- as.numeric(row$enddate - row$startdate) + 1
    data.frame(date = seq(row$startdate, row$enddate, by = "1 day"),
               nEvents = rmultinom(1, row$nEvents, rep(1/n_days, n_days)))
}

apply(test, 1, eventsPerDay1)

This, however, is not possible because apply calls as.matrix on test and thus it gets converted to a character matrix and all column classes are lost.

I already found two workarounds which you can find below, so my question is more of a philosphical nature.

library(magrittr)
############# Workaround 1
eventsPerDay2 <- function(startdate, enddate, nEvents) {
    n_days <- as.numeric(enddate - startdate) + 1
    data.frame(date = seq(startdate, enddate, by = "1 day"),
               nEvents = rmultinom(1, nEvents, rep(1/n_days, n_days)))
}

mapply(eventsPerDay2, test$startdate, test$enddate, test$nEvents, SIMPLIFY = F) %>%
    do.call(rbind, .)


############# Workaround 2
seq_along(test) %>%
    lapply(function(i) test[i, ]) %>%
    lapply(eventsPerDay1) %>%
    do.call(rbind, .)

My "problem" with the workarounds is the following:

Workaround 1: It may not be the best reason, but I simply do not like mapply . It has a different signature than the other *apply functions (as the the order of arguments differs) and I always feel that a for loop would just have been clearer.
Workaround 2: While being very flexible, I think it is not clear at first sight what is happening.

So does anyone know a function whose call would look like apply(test, 1, eventsPerDay1) and that will work?

Answer 1

We can do this with data.table

library(data.table)
res <- setDT(test)[,n_days := as.numeric(enddate - startdate) + 1 
           ][, .(date = seq(startdate, enddate, by= "1 day"),
          nEvents = c(rmultinom(1, nEvents, rep(1/n_days, n_days)))),
        by =  1:nrow(test)][, nrow := NULL]
str(res)
#Classes ‘data.table’ and 'data.frame':  152 obs. of  2 variables:
# $ date   : Date, format: "2010-03-07" "2010-03-08" "2010-03-09" "2010-03-10" ...
# $ nEvents: int  5 9 7 11 6 6 10 7 12 3 ...

The above can be wrapped in a function

eventsPerDay <- function(dat){  
      as.data.table(dat)[, n_days:= as.numeric(enddate - startdate) + 1
       ][, .(date = seq(startdate, enddate, by= "1 day"),
    nEvents = c(rmultinom(1, nEvents, rep(1/n_days, n_days)))) , 1:nrow(dat)
        ][, nrow := NULL][]
  }

eventsPerDay(test)

Answer 2

Another idea:

library(dplyr)
library(tidyr)

test %>%
  mutate(id = row_number()) %>%
  group_by(startdate) %>%
  complete(startdate = seq.Date(startdate, enddate, 1), nesting(id)) %>%
  group_by(id) %>%
  mutate(nEvents = rmultinom(1, first(nEvents), rep(1/n(), n()))) %>%
  select(startdate, nEvents)

Which gives:

#Source: local data frame [152 x 3]
#Groups: id [3]
#
#      id  startdate nEvents
#   <int>     <date>   <int>
#1      1 2010-03-07       6
#2      1 2010-03-08       6
#3      1 2010-03-09       6
#4      1 2010-03-10       7
#5      1 2010-03-11      12
#6      1 2010-03-12       5
#7      1 2010-03-13       8
#8      1 2010-03-14       5
#9      1 2010-03-15       5
#10     1 2010-03-16       9
## ... with 142 more rows

Answer 3

I have asked myself the same question.

I either end up splitting the df into a list (the base way)

xy <- data.frame()
xy.list <- split(xy, 1:nrow(xy))
out <- lapply(xy.list, function(x) ...)
answer <- unlist(out)

or try the hadleyverse dplyr way using rowwise (the blackbox way)

xy %>%
rowwise() %>%
mutate(newcol = function(x) ....)

I agree that their should be a base implementation of apply(xy, 1, function(x)) that doesn't coerce into character, but I imagine the R ancients implemented the matrix conversion for an advanced reason my primitive mind can't understand.

Apply function to each row of data.frame and preserve column classes

Question

3 answers

solution1
2 2016-09-14 11:28:38

solution2
2 2016-09-14 11:40:33

solution3
0 2018-07-03 12:47:32

Apply function to each row of data.frame and preserve column classes

Question

3 answers

solution1 2 2016-09-14 11:28:38

solution2 2 2016-09-14 11:40:33

solution3 0 2018-07-03 12:47:32

solution1
2 2016-09-14 11:28:38

solution2
2 2016-09-14 11:40:33

solution3
0 2018-07-03 12:47:32