简体   繁体   中英

defining custom dplyr methods in R package

I have a package with custom summary() , print() methods for objects that have a particular class. This package also uses the wonderful dplyr package for data manipulation - and I expect my users to write scripts that use both my package and dplyr.

One roadblock, which has been noted by others here and here is that dplyr verbs doesn't preserve custom classes - meaning that an ungroup command can strip my data.frames of their custom classes, and thus screw up method dispatch for summary , etc.

Hadley says "doing this correctly is up to you - you need to define a method for your class for each dplyr method that correctly restores all the classes and attributes" and I'm trying to take the advice - but I can't figure out how to correctly wrap the dplyr verbs.

Here's a simple toy example. Let's say I've defined a cars class, and I have a custom summary for it.

this works

library(tidyverse)

class(mtcars) <- c('cars', class(mtcars))

summary.cars <- function(x, ...) {
  #gather some summary stats
  df_dim <- dim(x)
  quantile_sum <- map(mtcars, quantile)
  
  cat("A cars object with:\n")
  cat(df_dim[[1]], 'rows and ', df_dim[[2]], 'columns.\n')
  
  print(quantile_sum)

}

summary(mtcars)

here's the problem

small_cars <- mtcars %>% filter(cyl < 6)
summary(small_cars)
class(small_cars)

that summary call for small_cars just gives me the generic summary, not my custom method, because small_cars no longer retains the cars class after dplyr filtering.

what I tried

First I tried writing a custom method around filter ( filter.cars ). That didn't work, because filter actually a wrapper around filter_ that allows for non-standard evaluation.

So I wrote a custom filter_ method for cars objects, attempting to implement @jwdink 's advice

filter_.cars <- function(df, ...) {
  
  old_classes <- class(df)
  out <- dplyr::filter_(df, ...)
  new_classes <- class(out)
  
  class(out) <- c(new_classes, old_classes) %>% unique()
  
  out
}

That doesn't work - I get an infinite recursion error:

Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)?

All I want to do is grab the classes on the incoming df, hand off to dplyr, then return the object with the same classnames as it had before the dplyr call. How do I change my filter_ wrapper to accomplish that? Thanks!

Your new filter_ method tries to apply to the new class within the definition, hence the recursion.

Following the advice in the issue you linked , try removing that new class prior to filter_ in your updated method.

class(out) <- class(out)[-1]

UPDATE:

Some things have changed since my original answer:

  • Many dplyr verbs no longer remove custom classes; for example, dplyr::filter keeps the class. However, some — like dplyr::group_by — still remove the class, so this question lives on.
  • With R 3.5 and beyond, method lookup changed its scoping rules
  • The trailing-underscore version of the verbs are deprecated

Recently ran into a hard-to-figure-out issues due to the second bullet, so just wanted to give a fuller example. Let's say you're using a custom class, with name custom_class , and you want to add a groupby method. Assuming you're using roxygen:

#' group_by.custom_class
#' 
#' @description Preserve the class of a `custom_class` object.
#' @inheritParams dplyr::group_by
#'
#' @importFrom dplyr group_by
#'
#' @export
#' @method group_by custom_class
group_by.custom_class <- function(.data, ...) {
  result <- NextMethod()
  return(reclass(.data, result))
}

(see original answer for definition of reclass function)

Highlights:

  • You need @method group_by custom_class to add S3method(group_by,custom_class) to NAMESPACE
  • You need @importFrom dplyr group_by to add importFrom(dplyr,group_by) to your NAMESPACE

I believe in R < 3.5 you could get away with just that second one, but now you need both.


OLD ANSWER:

Further suggestions were offered in the thread so I thought I'd update with what seems to be best practice, which is to use NextMethod() .

filter_.cars <- function(.data, ...) {
   result <- NextMethod()
   reclass(.data, result)
}

Where reclass is written by you; it's just a generic that (at least) adds the original class back on:

reclass <- function(x, result) {
  UseMethod('reclass')
}

reclass.default <- function(x, result) {
  class(result) <- unique(c(class(x)[[1]], class(result)))
  result
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM