简体   繁体   中英

Apply function to each row of data.frame and preserve column classes

I wonder if there is a way to apply a function to each row of a data.frame such that the column classes are preserved? Let's look at an example to clarify what I mean:

test <- data.frame(startdate = as.Date(c("2010-03-07", "2013-09-13", "2011-11-12")),
                   enddate = as.Date(c("2010-03-23", "2013-12-01", "2012-01-05")),
                   nEvents = c(123, 456, 789))

Suppose I would like to expand the data.frame test by inserting all days between startdate and enddate and distribute the number of events over those days. My first try to do so was this:

eventsPerDay1 <- function(row) {
    n_days <- as.numeric(row$enddate - row$startdate) + 1
    data.frame(date = seq(row$startdate, row$enddate, by = "1 day"),
               nEvents = rmultinom(1, row$nEvents, rep(1/n_days, n_days)))
}

apply(test, 1, eventsPerDay1)

This, however, is not possible because apply calls as.matrix on test and thus it gets converted to a character matrix and all column classes are lost.

I already found two workarounds which you can find below, so my question is more of a philosphical nature.

library(magrittr)
############# Workaround 1
eventsPerDay2 <- function(startdate, enddate, nEvents) {
    n_days <- as.numeric(enddate - startdate) + 1
    data.frame(date = seq(startdate, enddate, by = "1 day"),
               nEvents = rmultinom(1, nEvents, rep(1/n_days, n_days)))
}

mapply(eventsPerDay2, test$startdate, test$enddate, test$nEvents, SIMPLIFY = F) %>%
    do.call(rbind, .)


############# Workaround 2
seq_along(test) %>%
    lapply(function(i) test[i, ]) %>%
    lapply(eventsPerDay1) %>%
    do.call(rbind, .)

My "problem" with the workarounds is the following:

  • Workaround 1: It may not be the best reason, but I simply do not like mapply . It has a different signature than the other *apply functions (as the the order of arguments differs) and I always feel that a for loop would just have been clearer.
  • Workaround 2: While being very flexible, I think it is not clear at first sight what is happening.

So does anyone know a function whose call would look like apply(test, 1, eventsPerDay1) and that will work?

We can do this with data.table

library(data.table)
res <- setDT(test)[,n_days := as.numeric(enddate - startdate) + 1 
           ][, .(date = seq(startdate, enddate, by= "1 day"),
          nEvents = c(rmultinom(1, nEvents, rep(1/n_days, n_days)))),
        by =  1:nrow(test)][, nrow := NULL]
str(res)
#Classes ‘data.table’ and 'data.frame':  152 obs. of  2 variables:
# $ date   : Date, format: "2010-03-07" "2010-03-08" "2010-03-09" "2010-03-10" ...
# $ nEvents: int  5 9 7 11 6 6 10 7 12 3 ...

The above can be wrapped in a function

eventsPerDay <- function(dat){  
      as.data.table(dat)[, n_days:= as.numeric(enddate - startdate) + 1
       ][, .(date = seq(startdate, enddate, by= "1 day"),
    nEvents = c(rmultinom(1, nEvents, rep(1/n_days, n_days)))) , 1:nrow(dat)
        ][, nrow := NULL][]
  }

eventsPerDay(test)

Another idea:

library(dplyr)
library(tidyr)

test %>%
  mutate(id = row_number()) %>%
  group_by(startdate) %>%
  complete(startdate = seq.Date(startdate, enddate, 1), nesting(id)) %>%
  group_by(id) %>%
  mutate(nEvents = rmultinom(1, first(nEvents), rep(1/n(), n()))) %>%
  select(startdate, nEvents)

Which gives:

#Source: local data frame [152 x 3]
#Groups: id [3]
#
#      id  startdate nEvents
#   <int>     <date>   <int>
#1      1 2010-03-07       6
#2      1 2010-03-08       6
#3      1 2010-03-09       6
#4      1 2010-03-10       7
#5      1 2010-03-11      12
#6      1 2010-03-12       5
#7      1 2010-03-13       8
#8      1 2010-03-14       5
#9      1 2010-03-15       5
#10     1 2010-03-16       9
## ... with 142 more rows

I have asked myself the same question.

I either end up splitting the df into a list (the base way)

xy <- data.frame()
xy.list <- split(xy, 1:nrow(xy))
out <- lapply(xy.list, function(x) ...)
answer <- unlist(out)

or try the hadleyverse dplyr way using rowwise (the blackbox way)

xy %>%
rowwise() %>%
mutate(newcol = function(x) ....)

I agree that their should be a base implementation of apply(xy, 1, function(x)) that doesn't coerce into character, but I imagine the R ancients implemented the matrix conversion for an advanced reason my primitive mind can't understand.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM