简体   繁体   中英

Counting the Number of Events Currently Elapsing When a New Event Occurs (RevoScaleR/mrsdeploy)

Here is some Example Data:

Begin = c("10-10-2010 12:15:35", "10-10-2010 12:20:52", "10-10-2010 12:23:45", "10-10-2010 12:25:01", "10-10-2010 12:30:29")

End = c("10-10-2010 12:24:23", "10-10-2010 12:23:30", "10-10-2010 12:45:15", "10-10-2010 12:32:11", "10-10-2010 12:45:05")

df = data.frame(Begin, End)

I want to count the number of events that have not currently finished when a new event begins and record it in a new column. So for this particular example the end result that is desired would be a column with values: 0, 1, 1, 1, 2

I have a solution on how to do this with data.table and it worked fine. I would like to be able to find a solution that works in the RevoScaleR/mrsdeploy packages so the program that does this can take advantage of parallel computing/data chunking.

Here is the solution that works in data.table:

library(lubridate)
library(data.table)
df <- as.data.frame(lapply(df, dmy_hms))
dt <- as.data.table(df)
setkey(dt,Begin,End)[,id:=.I]
merge(dt, foverlaps(dt,dt)[id>i.id,.N,by="Begin,End"], all.x=T)[,id:=NULL][is.na(N),N:=0][]

Again, I am looking for one that can be executed remotely on SQLSERVER2016 with the packages mentioned.

Process begin and end in ascending order, and keep a count of how many begins and ends you have seen. If you don't have duplicate/spurious end events, this will work just fine.

This seems to do it with a simple sapply

sapply(df$Begin, function(x) sum((x < df$End) & (x > df$Begin))) 

To parallelize it just use rxExec , mclapply , parLapply , foreach , etc.

I found a way to do this in t-sql that was the quickest way. That information is located here: http://sqlmag.com/t-sql/intervals-and-counts-part-1

It could also be translated to R easily for anyone doing this in the future. I chose to just complete the operation in t-sql though.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM