简体   繁体   中英

R data.table: keep column when grouping by expression

When grouping by an expression involving a column (eg DT[...,.SD[c(1,.N)],by=expression(col)] ), I want to keep the value of col in .SD .

For example, in the following I am grouping by the remainder of a divided by 3, and keeping the first and last observation in each group. However, a is no longer present in .SD

f <- function(x) x %% 3

Q <- data.table(a = 1:20, x = rnorm(20), y = rnorm(20))
Q[, .SD[c(1., .N)], by = f(a)]

   f         x          y
1: 1 0.2597929  1.0256259
2: 1 2.1106619 -1.4375193
3: 2 1.2862501  0.7918292
4: 2 0.6600591 -0.5827745
5: 0 1.3758503  1.3122561
6: 0 2.6501140  1.9394756

The desired output is as if I had done the following

Q[, f := f(a)]
tmp <- Q[, .SD[c(1, .N)], by=f]
Q[, f := NULL]
tmp[, f := NULL]
tmp

    a         x          y
1:  1 0.2597929  1.0256259
2: 19 2.1106619 -1.4375193
3:  2 1.2862501  0.7918292
4: 20 0.6600591 -0.5827745
5:  3 1.3758503  1.3122561
6: 18 2.6501140  1.9394756

Is there a way to do this directly, without creating a new variable and creating a new intermediate data.table?

Instead of .SD , use .I to get the row index, extract that column ( $V1 ) and subset the original dataset

library(data.table)
Q[Q[, .I[c(1., .N)], by = f(a)]$V1]
#    a          x          y
#1:  1  0.7265238  0.5631753
#2: 19  1.7110611 -0.3141118
#3:  2  0.1643566 -0.4704501
#4: 20  0.5182394 -0.1309016
#5:  3 -0.6039137  0.1349981
#6: 18  0.3094155 -1.1892190

NOTE: The values in columns 'x', 'y' would be different as there was no set.seed

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM