简体   繁体   中英

Tag first (or n-th) observation of group

I have the following data:

DT = data.table(ID = c(1, 1, 2, 3, 3, 3), Y = c(2001, 2002, 1999, 2001, 2001, 2002))

DT
   ID  Y
1:  1 2001
2:  1 2002
3:  2 1999
4:  3 2001
5:  3 2001
6:  3 2002

the unique key of the dataset is ID and Y . I would like to create a variable first that equals 1 for the first observation of the group, data.table way, as defined by the key:

DT
   ID  Y    first
1:  1 2001    1
2:  1 2002    0
3:  2 1999    1
4:  3 2001    1
5:  3 2001    1
6:  3 2002    0

I was trying to do something with .I[1L] but couldn't figure out. Additionally, a bonus question would be to create such a variable for the n -th observation (assuming that n < max number of obs. in all groups). Thank you all!

Maybe you can try head

DT[,first := +(Y==head(Y,1)), by = ID]

or a more compact one (thank @akrun)

DT[, first := +(Y == Y[1]), ID]

or a more general one (thank @akrun again)

library(dplyr)
DT[, first := +(Y %in% nth(Y, 1)), by = ID]

which gives

> DT
   ID    Y first
1:  1 2001     1
2:  1 2002     0
3:  2 1999     1
4:  3 2001     1
5:  3 2001     1
6:  3 2002     0

We can use rleid :

library(data.table)
n <- 1

DT[, first := as.integer(rleid(Y) == n), ID]

#   ID    Y first
#1:  1 2001     1
#2:  1 2002     0
#3:  2 1999     1
#4:  3 2001     1
#5:  3 2001     1
#6:  3 2002     0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM