简体   繁体   中英

dplyr into data.table: filter > group by > count

I usually work with dplyr but face a rather large data set and my approach is very slow. I basically need to filter a df group it by dates and count the occurrence within

sample data (turned already everything into data.table )

library(data.table)
library(dplyr)

set.seed(123)

df <- data.table(startmonth = seq(as.Date("2014-07-01"),as.Date("2014-11-01"),by="months"),
                 endmonth = seq(as.Date("2014-08-01"),as.Date("2014-12-01"),by="months")-1)


df2 <- data.table(id = sample(1:10, 5, replace = T),
                  start = sample(seq(as.Date("2014-07-01"),as.Date("2014-10-01"),by="days"),5),
                  end = df$startmonth + sample(10:90,5, replace = T)
)

#cross joining
res <- setkey(df2[,c(k=1,.SD)],k)[df[,c(k=1,.SD)],allow.cartesian=TRUE][,k:=NULL]

My dplyr approach works but is slow

res %>% filter(start <=endmonth & end>= startmonth) %>% 
  group_by(startmonth,endmonth) %>% 
  summarise(countmonth=n()) 

My data.table knowledge is limited but I guess we would setkeys() on the date columns and something like res[ , :=( COUNT = .N , IDX = 1:.N ) , by = startmonth, endmonth] to get the counts by group but I'm not sure how the filter goes in there.

Appreciate your help!

You could do the counting inside the join:

df2[df, on=.(start <= endmonth, end >= startmonth), allow.cartesian=TRUE, .N, by=.EACHI]

        start        end N
1: 2014-07-31 2014-07-01 1
2: 2014-08-31 2014-08-01 4
3: 2014-09-30 2014-09-01 5
4: 2014-10-31 2014-10-01 3
5: 2014-11-30 2014-11-01 3

or add it as a new column in df :

df[, n := 
  df2[.SD, on=.(start <= endmonth, end >= startmonth), allow.cartesian=TRUE, .N, by=.EACHI]$N
]

How it works. The syntax is x[i, on=, allow.cartesian=, j, by=.EACHI] . Each row if i is used to look up values in x . The symbol .EACHI indicates that aggregation ( j=.N ) will be done for each row of i .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM