简体   繁体   中英

logical operators in data.table with by-groups

I am trying to apply a logical operator to columns of data.table, which works fine without by= groups, but with by= groups something weird is happening. I'm using R 3.3.1 and data.table 1.9.6 (off CRAN).

Here is an example data set:

library(data.table)
# x <- matrix(rnorm(10),ncol=2)
x <- structure(c(0.323618333400921, -0.103619160085676, -0.823322722581548, 
-0.0175726272468397, 2.17286179204364, 1.63465793636681, 0.258469984285221, 
-0.411622451584, 0.22594247704226, 0.114427086028551), .Dim = c(5L, 2L))

dt <- data.table(x)
dt$id <- c(1,1,1,2,2)
dt

##             V1         V2 id
## 1:  0.32361833  1.6346579  1
## 2: -0.10361916  0.2584700  1
## 3: -0.82332272 -0.4116225  1
## 4: -0.01757263  0.2259425  2
## 5:  2.17286179  0.1144271  2

Let's say I would like to know if cumulative minimum (that is, the minimal value in each column either from the start of sample to current row, or from the start of the by= group to the current row) is below zero. When there are no by= groups, we are good:

dt[, cummin(.SD), .SDcols=1:2]
##            V1         V2
## 1:  0.3236183  1.6346579
## 2: -0.1036192  0.2584700
## 3: -0.8233227 -0.4116225
## 4: -0.8233227 -0.4116225
## 5: -0.8233227 -0.4116225

> dt[, cummin(.SD)<0, .SDcols=1:2]
##         V1    V2
## [1,] FALSE FALSE
## [2,]  TRUE FALSE
## [3,]  TRUE  TRUE
## [4,]  TRUE  TRUE
## [5,]  TRUE  TRUE

So far so good. I can also compute actual cumulative minimums with by= groups:

dt[, cummin(.SD), by=id]
##    id          V1         V2
## 1:  1  0.32361833  1.6346579
## 2:  1 -0.10361916  0.2584700
## 3:  1 -0.82332272 -0.4116225
## 4:  2 -0.01757263  0.2259425
## 5:  2 -0.01757263  0.1144271

However, if I try to compute the indicator variables as before, the output is as follows:

dt[, (cummin(.SD)<0), by=id]
##     id    V1
##  1:  1 FALSE
##  2:  1  TRUE
##  3:  1  TRUE
##  4:  1 FALSE
##  5:  1 FALSE
##  6:  1  TRUE
##  7:  2  TRUE
##  8:  2  TRUE
##  9:  2 FALSE
## 10:  2 FALSE

Instead of 2 variables of 5 obs. each, we have one with 10. It appears that within each by-group, the variables are stacked on top of each other. How should I do this correctly?

It will work with lapply :

dt[,lapply(.SD,function(x) cummin(x)<0),by="id"]
   id    V1    V2
1:  1 FALSE FALSE
2:  1  TRUE FALSE
3:  1  TRUE  TRUE
4:  2  TRUE FALSE
5:  2  TRUE FALSE

As commented by David Arenburg, it seems that it works with cummin(.SD) because it is a data.frame and it doesn't work with cummin(.SD)<0 which is a matrix. So another solution is to cast the matrix as a data.frame :

dt[, (as.data.frame(cummin(.SD)<0)), by=id]
   id    V1    V2
1:  1 FALSE FALSE
2:  1  TRUE FALSE
3:  1  TRUE  TRUE
4:  2  TRUE FALSE
5:  2  TRUE FALSE

You can try something like this :

dt2 <- dt[, cummin(.SD), by = id]
results <- data.table(dt2[, .SD<0])
results$id <- dt$id

results
   id    V1    V2
1:  1 FALSE FALSE
2:  1  TRUE FALSE
3:  1  TRUE  TRUE
4:  2  TRUE FALSE
5:  2  TRUE FALSE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM