logical operators in data.table with by-groups

Question

I am trying to apply a logical operator to columns of data.table, which works fine without by= groups, but with by= groups something weird is happening. I'm using R 3.3.1 and data.table 1.9.6 (off CRAN).

Here is an example data set:

library(data.table)
# x <- matrix(rnorm(10),ncol=2)
x <- structure(c(0.323618333400921, -0.103619160085676, -0.823322722581548, 
-0.0175726272468397, 2.17286179204364, 1.63465793636681, 0.258469984285221, 
-0.411622451584, 0.22594247704226, 0.114427086028551), .Dim = c(5L, 2L))

dt <- data.table(x)
dt$id <- c(1,1,1,2,2)
dt

##             V1         V2 id
## 1:  0.32361833  1.6346579  1
## 2: -0.10361916  0.2584700  1
## 3: -0.82332272 -0.4116225  1
## 4: -0.01757263  0.2259425  2
## 5:  2.17286179  0.1144271  2

Let's say I would like to know if cumulative minimum (that is, the minimal value in each column either from the start of sample to current row, or from the start of the by= group to the current row) is below zero. When there are no by= groups, we are good:

dt[, cummin(.SD), .SDcols=1:2]
##            V1         V2
## 1:  0.3236183  1.6346579
## 2: -0.1036192  0.2584700
## 3: -0.8233227 -0.4116225
## 4: -0.8233227 -0.4116225
## 5: -0.8233227 -0.4116225

> dt[, cummin(.SD)<0, .SDcols=1:2]
##         V1    V2
## [1,] FALSE FALSE
## [2,]  TRUE FALSE
## [3,]  TRUE  TRUE
## [4,]  TRUE  TRUE
## [5,]  TRUE  TRUE

So far so good. I can also compute actual cumulative minimums with by= groups:

dt[, cummin(.SD), by=id]
##    id          V1         V2
## 1:  1  0.32361833  1.6346579
## 2:  1 -0.10361916  0.2584700
## 3:  1 -0.82332272 -0.4116225
## 4:  2 -0.01757263  0.2259425
## 5:  2 -0.01757263  0.1144271

However, if I try to compute the indicator variables as before, the output is as follows:

dt[, (cummin(.SD)<0), by=id]
##     id    V1
##  1:  1 FALSE
##  2:  1  TRUE
##  3:  1  TRUE
##  4:  1 FALSE
##  5:  1 FALSE
##  6:  1  TRUE
##  7:  2  TRUE
##  8:  2  TRUE
##  9:  2 FALSE
## 10:  2 FALSE

Instead of 2 variables of 5 obs. each, we have one with 10. It appears that within each by-group, the variables are stacked on top of each other. How should I do this correctly?

Answer 1

It will work with lapply :

dt[,lapply(.SD,function(x) cummin(x)<0),by="id"]
   id    V1    V2
1:  1 FALSE FALSE
2:  1  TRUE FALSE
3:  1  TRUE  TRUE
4:  2  TRUE FALSE
5:  2  TRUE FALSE

As commented by David Arenburg, it seems that it works with cummin(.SD) because it is a data.frame and it doesn't work with cummin(.SD)<0 which is a matrix. So another solution is to cast the matrix as a data.frame :

dt[, (as.data.frame(cummin(.SD)<0)), by=id]
   id    V1    V2
1:  1 FALSE FALSE
2:  1  TRUE FALSE
3:  1  TRUE  TRUE
4:  2  TRUE FALSE
5:  2  TRUE FALSE

Answer 2

You can try something like this :

dt2 <- dt[, cummin(.SD), by = id]
results <- data.table(dt2[, .SD<0])
results$id <- dt$id

results
   id    V1    V2
1:  1 FALSE FALSE
2:  1  TRUE FALSE
3:  1  TRUE  TRUE
4:  2  TRUE FALSE
5:  2  TRUE FALSE

logical operators in data.table with by-groups

Question

2 answers

solution1
3 ACCPTED 2016-11-03 18:56:19

solution2
1 2016-11-03 18:52:54

logical operators in data.table with by-groups

Question

2 answers

solution1 3 ACCPTED 2016-11-03 18:56:19

solution2 1 2016-11-03 18:52:54

solution1
3 ACCPTED 2016-11-03 18:56:19

solution2
1 2016-11-03 18:52:54