简体   繁体   中英

How can I get data attributes from rlang's .data like I can with .?

I am building a tidy-compatible function for use inside dplyr 's mutate where I'd like to pass a variable and also the data set I'm working with, and use information from both to build a vector.

As a basic example, imagine I want to return a string containing the mean of the variable and the number of rows in the data set (I know I could just take the length of var , ignore that, it's an example).

library(tidyverse)
library(rlang)

info <- function(var,df = get(".",envir = parent.frame())) {
  paste(mean(var),nrow(df),sep=', ')
}

dat <- data.frame(a = 1:10, i = c(rep(1,5),rep(2,5)))

#Works fine, 'types' contains '5.5, 10'
dat %>% mutate(types = info(a))

Ok, great so far. But now maybe I want it to work with grouped data. var will be from just one group, but . would be the full data set. So instead I'll use rlang 's .data pronoun, which is just the data being worked with.

However, .data is not like . . . is the data set, but .data is just a pronoun from which I can pull variables with .data[[varname]] .

info2 <- function(var,df = get(".data",envir = parent.frame())) {
  paste(mean(var),nrow(.data),sep=', ')
}

#Doesn't work. nrow(.data) gives blank strings
dat %>% group_by(i) %>% mutate(types = info2(a))

How can I get the full thing from .data ? I know I didn't include it in the example but specifically I both need some stuff from attr(dat) AND some stuff from the variables in dat that is properly subsetted for the grouping, so neither reverting to . nor just pulling out variables and getting stuff from there would work.

As Alexis mentioned in the above comment, this is not possible, as it's not the intended use of .data . However, now that I've given up on doing this directly, I've worked up a kludge using a combination of . and .data .

info <- function(var,df = get(".",envir = parent.frame())) {
  #First, get any information you need from .
  fulldatasize <- nrow(df)

  #Then, check if you actually need .data,
  #i.e. the data is grouped and you need a subsample
  if (length(var) < nrow(df)) {
      #If you are, get the list of variables you want from .data, maybe all of them
      namesiwant <- names(df)

      #Get .data
      datapronoun <- get('.data',envir=parent.frame())

      #And remake df using just the subsample
      df <- data.frame(lapply(namesiwant, function(x) datapronoun[[x]]))
      names(df) <- namesiwant
  }

  #Now do whatever you want with the .data data
  groupsize <- nrow(df)

  paste(mean(var),groupsize,fulldatasize,sep=', ')
}

dat <- data.frame(a = 1:10, i = c(rep(1,5),rep(2,5)))

#types contains the within-group mean, then 5, then 10
dat %>% group_by(i) %>% mutate(types = info(a))

Why not use length() instead of nrow() here ?

dat <- data.frame(a = 1:10, i = c(rep(1,5),rep(2,5)))

info <- function(var) {
  paste(mean(var),length(var),sep=', ')
}
dat %>% group_by(i) %>% mutate(types = info(a))
#> # A tibble: 10 x 3
#> # Groups:   i [2]
#>        a     i types
#>    <int> <dbl> <chr>
#>  1     1     1 3, 5 
#>  2     2     1 3, 5 
#>  3     3     1 3, 5 
#>  4     4     1 3, 5 
#>  5     5     1 3, 5 
#>  6     6     2 8, 5 
#>  7     7     2 8, 5 
#>  8     8     2 8, 5 
#>  9     9     2 8, 5 
#> 10    10     2 8, 5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM