简体   繁体   中英

How to group_by without creating a grouping variable?

I need to perform a basic group_by / mutate operation using an auxiliary grouping variable. For instance:

df <- data.frame(
  u = c(0, 0, 1, 0, 1),
  v = c(8, 4, 2, 3, 5)
)

df %>%
  group_by(tmp = cumsum(u)) %>%
  mutate(w = cumprod(v)) %>%
  ungroup %>%
  select(-tmp)

My problem is that if df happens to already contain a column named tmp I will lose it.

Of course I could choose a very exotic name instead of tmp to reduce the likeliness of a collision (or I could even choose something like strrep("z", max(nchar(names(df))) + 1) to be sure) but I'd prefer to have a cleaner solution.

In other words, I'm looking for the dplyr equivalent of this data.table line:

setDT(df)[, w := cumprod(v), by = cumsum(u)]

We could create a function to take care of this. Assuming that the temporary grouping variable to be created is 'tmp', by concatenating with the column names of the dataset and calling make.unique , if there is already a 'tmp' column in the dataset, the duplicate one will be renamed as 'tmp.1'. Using the !! , naming the column with 'tmp.1' (from nm1 ) will not affect the 'tmp' already present in the dataset. In case, if there is no 'tmp', column, the grouping column will be named as 'tmp' and later removed with select

f1 <- function(dat, grpCol, Col) {
  grpCol <- enquo(grpCol)
  Col <- enquo(Col)

 changeCol <- "tmp"
 nm1 <-  tail(make.unique(c(names(dat), changeCol)), 1)
 dat %>%
    group_by(!! (nm1) := cumsum(!! grpCol)) %>%
    mutate(w = cumprod(!!Col)) %>%
     ungroup %>%
     select(-one_of(nm1)) 


}

-run the function

f1(df, u, v)
# A tibble: 5 x 3
#      u     v     w
#  <dbl> <dbl> <dbl>
#1  0     8.00  8.00
#2  0     4.00 32.0 
#3  1.00  2.00  2.00
#4  0     3.00  6.00
#5  1.00  5.00  5.00


 f1(df %>% mutate(tmp = 1), u, v) #create a 'tmp' column in dataset
# A tibble: 5 x 4
#      u     v   tmp     w
#  <dbl> <dbl> <dbl> <dbl>
#1  0     8.00  1.00  8.00
#2  0     4.00  1.00 32.0 
#3  1.00  2.00  1.00  2.00
#4  0     3.00  1.00  6.00
#5  1.00  5.00  1.00  5.00

As a followup (comments from @Frank) about passing expressions

expr <- quos(tmp = cumsum(u), w = cumprod(v))
#additional checks outside the function
names(expr)[1] <- if(names(expr)[1] %in% names(df)) 
             strrep(names(expr)[1], 2) else names(expr)[1]


f2 <- function(dat, exprs ){

dat %>%
    group_by(!!! exprs[1]) %>%
    mutate(!!! exprs[2])

}

f2(df, expr)
# A tibble: 5 x 4
# Groups: tmp [3]
#      u     v   tmp     w
#  <dbl> <dbl> <dbl> <dbl> 
#1  0     8.00  0     8.00
#2  0     4.00  0    32.0 
#3  1.00  2.00  1.00  2.00
#4  0     3.00  1.00  6.00
#5  1.00  5.00  2.00  5.00

You could use ave instead:

df %>% mutate(w = ave(v, cumsum(u), FUN = cumprod))

by would also work:

df %>% 
   by(cumsum(.$u), mutate, w = cumprod(v)) %>% 
   unclass %>% 
   bind_rows

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM