简体   繁体   中英

Keep empty groups when grouping with data.table in R

I want to keep empty groups (with a default value like NA or 0) when grouping by multiple conditions.

dt = data.table(user = c("A", "A", "B"), date = c("t1", "t2", "t1"), duration = c(1, 2, 1))
dt[, .("total" = sum(duration)), by = .(date, user)]

Result:

   date user total
1:   t1    A     1
2:   t2    A     2
3:   t1    B     1

Desired result:

   date user total
1:   t1    A     1
2:   t2    A     2
3:   t1    B     1
3:   t2    B    NA

One solution could be to add rows with 0 values before grouping, but it would require to create the Descartes product of many columns and manually checking if a value already exists for that combination, but I would prefer a built-in / simpler one.

You can try:

dt[CJ(user = user, date = date, unique = TRUE), on = .(user, date)]

   user date duration
1:    A   t1        1
2:    A   t2        2
3:    B   t1        1
4:    B   t2       NA

Here is an option with complete from tidyr

library(tidyr)
library(dplyr)
dt1 <- dt[, .("total" = sum(duration)), by = .(date, user)]
dt1 %>%
    complete(user, date)
#   user  date  total
#  <chr> <chr> <dbl>
#  A     t1        1
#2 A     t2        2
#3 B     t1        1
#4 B     t2       NA

Or using dcast/melt

melt(dcast(dt, user ~ date, value.var = 'duration', sum), 
      id.var = 'user', variable.name = 'date', value.name = 'total')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM