I have a data.table l1 with three columns, Minute, Posixct for time and group_cor for my value, and I would like to calculate the number of unique values of group_cor in certain time intervals based on data.table s1. In my original dataset I have about 1 500 000 data rows lasting approximately 12 days (structured as l1) so I am looking for a fast method to go through all this data.
Posixct group_cor Minute
1: 2017-08-11 13:31:36 185 2017-08-11 13:31:00
2: 2017-08-11 13:31:36 185 2017-08-11 13:31:00
3: 2017-08-11 13:31:36 185 2017-08-11 13:31:00
4: 2017-08-11 13:31:37 186 2017-08-11 13:31:00
5: 2017-08-11 13:31:37 186 2017-08-11 13:31:00
6: 2017-08-11 13:31:37 187 2017-08-11 13:31:00
7: 2017-08-11 13:31:37 187 2017-08-11 13:31:00
8: 2017-08-11 13:31:37 187 2017-08-11 13:31:00
9: 2017-08-11 13:31:37 187 2017-08-11 13:31:00
This is s1 where the start indicates the start of the time interval and end the end of it. Each time interval is one minute and this window is mooved along 1 second at a time.
start end
1: 2017-08-11 13:31:36 2017-08-11 13:32:36
2: 2017-08-11 13:31:37 2017-08-11 13:32:37
3: 2017-08-11 13:31:38 2017-08-11 13:32:38
4: 2017-08-11 13:31:39 2017-08-11 13:32:39
5: 2017-08-11 13:31:40 2017-08-11 13:32:40
I have tried using data.table to add a column No to the data.table s1 where I use the "on" argument to specify the time window.
oma <- function(x) length(unique(x))
s1[ l1, No:=oma(group_cor), on=c('start<Posixct','end>=Posixct')]
However, this gives
> s1
start end No
1: 2017-08-11 13:31:36 2017-08-11 13:32:36 188
2: 2017-08-11 13:31:37 2017-08-11 13:32:37 188
3: 2017-08-11 13:31:38 2017-08-11 13:32:38 188
4: 2017-08-11 13:31:39 2017-08-11 13:32:39 188
5: 2017-08-11 13:31:40 2017-08-11 13:32:40 188
The No column is 188 for all the time windows, which is not true (and I dont know where this value comes from..)
> range(s1$No)
[1] 188 188
I know the amount of unique values for each minute and the new No should be similar to them
> tapply(l1$group_cor, l1$Minute,oma)
2017-08-11 13:31:00 2017-08-11 13:32:00 2017-08-11 13:33:00 2017-08-11 13:34:00
11 17 18 17
2017-08-11 13:35:00 2017-08-11 13:36:00 2017-08-11 13:37:00 2017-08-11 13:38:00
21 22 23 22
2017-08-11 13:39:00 2017-08-11 13:40:00
20 22
What am I doing wrong? Any help would be highly appreciated! Also suggestions to how I could do this in another way.. Thank you very much.
If I understand you correctly and which is what Frank mentioned in the comments, you are looking for
intvl[dat, cnt := uniqueN(group_cor), by=.EACHI, on=c('start<Posixct','end>=Posixct')][,
cnt := replace(cnt, is.na(cnt), 0L)]
output:
start end cnt
1: 2017-08-11 13:31:36 2017-08-11 13:32:36 1
2: 2017-08-11 13:31:37 2017-08-11 13:32:37 0
3: 2017-08-11 13:31:38 2017-08-11 13:32:38 0
4: 2017-08-11 13:31:39 2017-08-11 13:32:39 0
5: 2017-08-11 13:31:40 2017-08-11 13:32:40 0
data:
library(data.table)
dat <- fread("Posixct,group_cor,Minute
2017-08-11 13:31:36,185,2017-08-11 13:31:00
2017-08-11 13:31:36,185,2017-08-11 13:31:00
2017-08-11 13:31:36,185,2017-08-11 13:31:00
2017-08-11 13:31:37,186,2017-08-11 13:31:00
2017-08-11 13:31:37,186,2017-08-11 13:31:00
2017-08-11 13:31:37,187,2017-08-11 13:31:00
2017-08-11 13:31:37,187,2017-08-11 13:31:00
2017-08-11 13:31:37,187,2017-08-11 13:31:00
2017-08-11 13:31:37,187,2017-08-11 13:31:00")
cols <- c("Posixct", "Minute")
dat[, (cols) := lapply(.SD, as.POSIXct, format="%Y-%m-%d %H:%M:%S"), .SDcols=cols]
intvl <- fread("start,end
2017-08-11 13:31:36,2017-08-11 13:32:36
2017-08-11 13:31:37,2017-08-11 13:32:37
2017-08-11 13:31:38,2017-08-11 13:32:38
2017-08-11 13:31:39,2017-08-11 13:32:39
2017-08-11 13:31:40,2017-08-11 13:32:40")
cols <- c("start", "end")
intvl[, (cols) := lapply(.SD, as.POSIXct, format="%Y-%m-%d %H:%M:%S"), .SDcols=cols]
I think you couldn't get it previously is because you had too many different variables in your R session. It would help to restart the session and use a clean data and interval.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.