简体   繁体   中英

dplyr function group_by several variables

I've read the introduction to R's dplyr programming ( https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html ), which is very useful.

I often build quite complex functions which include several sets of grouping variables. For example, given a dataset df, I may want the function to summarise by some variables (let's say grouping variables G1 and G2) and then summarise by some others (let's say G3), and I'll then use these summaries together to produce some final result

df <- data.frame(xV = 1:3,yV=0:2, G1 =c(1,1,0),G2=c(0,0,1),G3=c(1,1,1))
#Within my function I want to calculate 
#a)
df%>%group_by(G1,G2)%>%summarise(MEANS1= mean(xV,na.rm=T))
#As well as (b_
df%>%group_by(G3)%>%summarise(MEAN2= mean(xV,na.rm=T))

If I only had to do the first grouping (ie (a)) I can build a function, using ...

TAB2<-function(data,x,...){
  require(dplyr)  
  x<-enquo (x)
  groupSet1 <- enquos(...)

  data%>%group_by(!!!(groupSet1))%>%
    summarise(MEAN=mean(!!x,na.rm=T))
}

#Which gives me my results
TAB2(data=df,x=xV,G1,G2)
# A tibble: 2 x 3
# Groups:   G1 [2]
     G1    G2  MEAN
  <dbl> <dbl> <dbl>
1     0     1   3  
2     1     0   1.5

But if I want to do both (a) and (b) I need in some way to distinguish between the first and second set of grouping variables (G1, G2) and G3 respectively. I can't do it by just chucking the grouping variables after all the other inputs. Is there any way I can specify these two sets in the input, something along the lines of

TAB3<-function(data,x,y, GroupSet1=c(G1,G2) and GroupSet2=(G3)){

 x<-enquo (x)
 y<-enquo (x)
#a)
df%>%group_by(GroupSet1)%>%summarise(MEANS1= mean(!!x,na.rm=T))
#As well as (b_)
df%>%group_by(GroupSet2)%>%summarise(MEAN2= mean(!!y,na.rm=T))

}

I have tried to "quote" the two sets in a similar way to x<-enquo(x) in a range of ways but I always get an error. Could you please help? If it was also possible to pass a list of variables as x and y to summarise_at it would also make the function as generic as possible, which would be even better. Basically I'm trying to create a template function that can take several variable sets x and y as well as several group sets, with the aim to produce the mean of the variables in the sets x and y by the corresponding group sets (G1, G2 and G3 respectively).

You can try

TAB3<-function(data, y, grouping_list){
  require(tidyverse)
  map(grouping_list, ~group_by_at(data, .) %>% 
        summarise_at(y, list(Mean= mean), na.rm=T)) }

TAB3(df, "xV", list(c("G1", "G2"), c("G3"))) 
[[1]]
# A tibble: 2 x 3
# Groups:   G1 [2]
     G1    G2  Mean
  <dbl> <dbl> <dbl>
1     0     1   3  
2     1     0   1.5

[[2]]
# A tibble: 1 x 2
     G3  Mean
  <dbl> <dbl>
1     1     2

If you wanted to use the ellipsis as per your TAB2 example, you could try:

update based on new info:

TAB3<-function(df,x,...){
  args <- substitute(list(...))
  names_env <- setNames(as.list(names(df)), names(df))
  arg_list <- eval(args, names_env)

  out <- vector(mode = "list", length(arg_list)) 

  for(i in seq_along(arg_list)){
    out[[i]] <- df %>% group_by(!!!syms(arg_list[[i]])) %>%
      summarise_at(vars(!!!enquos(x)) ,.funs  = list(mean=mean), na.rm = T)
  }
  out
}

TAB3(df, x = c(xV,yV), GroupSet1=c(G1,G2), GroupSet2=G3)

#[[1]]
# A tibble: 2 x 4
# Groups:   G1 [2]
#     G1    G2 xV_mean yV_mean
#  <dbl> <dbl>   <dbl>   <dbl>
#1     0     1     3       2  
#2     1     0     1.5     0.5

#[[2]]
# A tibble: 1 x 3
#     G3 xV_mean yV_mean
#  <dbl>   <dbl>   <dbl>
#1     1       2       1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM