简体   繁体   中英

Adjust function to work with dplyr/magrittr

I have:

df <- data_frame(
  a = 1:2, 
  b = list(1:10, 4:40)
)

and

foo <- function(x) mean(unlist(x))

The following works as expected:

df$b %>% foo

However, I was not able to figure out which modifications of foo are needed in order for df %>% foo(b) to work.

You can pass an ... parameter directly to the vars helper of summarise_at , eg

foo <- function(.tbl, ...){
    summarise_at(.tbl, 
                 vars(...), 
                 funs(mean(unlist(.))))
}

It works for single variables, list column or not:

df %>% foo(b)
## # A tibble: 1 × 1
##          b
##      <dbl>
## 1 18.48936

or multiple:

df %>% foo(a, b)
## # A tibble: 1 × 2
##       a        b
##   <dbl>    <dbl>
## 1   1.5 18.48936

To go further with NSE, check out lazyeval , which is the package dplyr uses to implement its NSE.

Also note that the SE/NSE system of dplyr has just been rebuilt in the development version (not on CRAN yet, and not yet documented).


Bonus points: Do it all in base R!

foo <- function(.tbl, ...){
    # collect dots as character vector
    cols <- as.character(substitute(list(...))[-1])
    cls <- class(.tbl)

    # handle grouped tibbles properly
    if('grouped_df' %in% cls){
        cls <- cls[which(cls != 'grouped_df')]    # drop grouping
        res <- aggregate(.tbl[cols], 
                         .tbl[attr(.tbl, 'vars')], 
                         FUN = function(x){mean(unlist(x))})
    } else {
        res <- as.data.frame(lapply(.tbl[cols], function(x){mean(unlist(x))}))
    }

    class(res) <- cls    # keep class (tibble, etc.)
    res
}

which works with list columns, groups, and multiple columns or groups, keeping class but dropping grouping:

df %>% foo(a, b)
## # A tibble: 1 × 2
##       a        b
##   <dbl>    <dbl>
## 1   1.5 18.48936

df %>% group_by(a) %>% foo(b)
## # A tibble: 2 × 2
##       a     b
##   <int> <dbl>
## 1     1   5.5
## 2     2  22.0

mtcars %>% foo(mpg, hp)
##        mpg       hp
## 1 20.09062 146.6875

mtcars %>% group_by(cyl, am) %>% foo(hp, mpg)
## # A tibble: 6 × 4
##     cyl    am        hp      mpg
##   <dbl> <dbl>     <dbl>    <dbl>
## 1     4     0  84.66667 22.90000
## 2     6     0 115.25000 19.12500
## 3     8     0 194.16667 15.05000
## 4     4     1  81.87500 28.07500
## 5     6     1 131.66667 20.56667
## 6     8     1 299.50000 15.40000

You could get to the result this way:

library(dplyr)
library(purrr)
df %>% 
    mutate_(mn=~map_dbl(b, mean), size=~map_dbl(b, length)) %>%
    summarize_(m=~weighted.mean(mn, size))
#  m
#   <dbl>
# 1 18.49

Or define foo in this way:

foo2 <- function(d, col) {
    col_name <- as.character(substitute(col))
    mean(unlist(d[[col_name]]))
}
df %>% foo2(b)
[1] 18.49

you can update the function with something like:

foo <- function(df, x) {
  x <- df[ deparse(substitute(x))] 
  mean(unlist(x)) }

df %>% foo(b)
[1] 18.48936

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM