简体   繁体   中英

R: data.table aggregate using external grouping vector

I have data

dt <- data.table(time=as.POSIXct(c("2018-01-01 01:01:00","2018-01-01 01:05:00","2018-01-01 01:01:00")), y=c(1,10,9))
> dt
                  time  y
1: 2018-01-01 01:01:00  1
2: 2018-01-01 01:05:00 10
3: 2018-01-01 01:01:00  9 

and I would like to aggregate by time . Usually, I would do

dt[,list(sum=sum(y),count=.N), by="time"]
                  time sum count
1: 2018-01-01 01:01:00  10     2
2: 2018-01-01 01:05:00  10     1

but this time, I would also like to get zero values for the minutes in between, ie,

                  time sum count
1: 2018-01-01 01:01:00  10     2
2: 2018-01-01 01:02:00   0     0
3: 2018-01-01 01:03:00   0     0
4: 2018-01-01 01:04:00   0     0
5: 2018-01-01 01:05:00  10     1

Could this be done, for example, using an external vector

times <- seq(from=min(dt$time),to=max(dt$time),by="mins")

that can be fed to the data.table function as a grouping variable?

You would typically do with with a join (either before or after the aggregation). For example:

dt <- dt[J(times), on = "time"]
dt[,list(sum=sum(y, na.rm = TRUE), count= sum(!is.na(y))), by=time]
#                  time sum count
#1: 2018-01-01 01:01:00  10     2
#2: 2018-01-01 01:02:00   0     0
#3: 2018-01-01 01:03:00   0     0
#4: 2018-01-01 01:04:00   0     0
#5: 2018-01-01 01:05:00  10     1

Or in a "piped" version:

dt[J(times), on = "time"][
  , .(sum = sum(y, na.rm = TRUE), count= sum(!is.na(y))), 
  by = time]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM