I have a list of people and their working start and end times during a day. I want to plot a curve showing the total of people working at any given minute in the day. What I could do is just add 1440 additional conditional boolean variables for each minute of the day and sum them up, but that seems very inelegant. I'm wondering if there a better way to do it (integrals?).
Here's the code to generate a df with my sample data:
sample_wt <- function() {
require(lubridate)
set.seed(10)
worktime <- data.frame(
ID = c(1:100),
start = now()+abs(rnorm(100,4800,2400))
)
worktime$end <- worktime$start + abs(rnorm(100,20000,10000))
worktime$length <- difftime(worktime$end, worktime$start, units="mins")
worktime
}
To create a sample data , you can do something like:
DF <- sample_wt()
Here one option using IRanges
package from Bioconductor.
library(IRanges)
## generate sample
DF <- sample_wt()
## create the range from the sample data
rangesA <- IRanges(as.numeric(DF$start), as.numeric(DF$end))
## create one minute range
xx = seq(min(DF$start),max(DF$end),60)
rangesB <- IRanges(as.numeric(xx),as.numeric(xx+60))
## count the overlaps
ov <- countOverlaps(rangesB, rangesA, type="within")
## plot the result
plot(xx,ov,type='l')
I don't have lubridate
installed, so I produced the data.frame through Sys.time
instead of now
(guess they should be similar). This could make the trick:
minutes<-seq(as.POSIXct(paste(sep="",Sys.Date()," 00:00:00")),by="min",length.out=24*60)
rowSums(outer(minutes,worktime$start,">") & outer(minutes,worktime$end,"<"))
Surely it can be improved, but this seems to do it:
time_range <- seq(min(DF$start), max(DF$end), 60)
result <- integer(length(time_range))
for (t in seq_along(time_range)) {
result[t] <- sum(DF$start <= time_range[t] & DF$end >= time_range[t])
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.