Flag randomly selected N rows by group in data.table

Question

At the data.table in column C3 I want to flag N randomly selected rows by each group (C1). There are several similar questions have already been asked on SO here , here and here . But based on the answers still cannot figure out how to find a solution for my task.

set.seed(1)    
dt = data.table(C1 = c("A","A","A","B","C","C","C","D","D","D"), 
                 C2 = c(2,1,3,1,2,3,4,5,4,5)) 

dt
    C1 C2
 1:  A  2
 2:  A  1
 3:  A  3
 4:  B  1
 5:  C  2
 6:  C  3
 7:  C  4
 8:  D  5
 9:  D  4
10:  D  5

Here are row indexes for two randomly selected rows by each group C1 (doesn't work well for group B):

dt[, sample(.I, min(.N, 2)), by = C1]$V1
[1]  1  3  3  7  5 10  9

NB: for B only one row should be selected because group B consists of one row only.

Here is a solution for one randomly selected row in each group, which often doesn't work for group B:

dt[, C3 := .I == sample(.I, 1), by = C1]
dt
    C1 C2    C3
 1:  A  2 FALSE
 2:  A  1  TRUE
 3:  A  3 FALSE
 4:  B  1 FALSE
 5:  C  2  TRUE
 6:  C  3 FALSE
 7:  C  4 FALSE
 8:  D  5  TRUE
 9:  D  4 FALSE
10:  D  5 FALSE

Actually I want to expand it on N rows. I've tried (for two rows):

dt[, C3 := .I==sample(.I, min(.N, 2)), by = C1]

which of course doesn't work.

Any help is much appreciated!

Answer 1

dt[, C3 := 1:.N %in% sample(.N, min(.N, 2)), by = C1]

Or use head , but I think that should be slower

dt[, C3 := 1:.N %in% head(sample(.N), 2) , by = C1]

If the number of flagged rows is not constant you can do

flagsz <- c(2, 1, 2, 3)
dt[, C3 := 1:.N %in% sample(.N, min(.N, flagsz[.GRP])), by = C1]

Answer 2

N=2
dt[, C3 := {if (.N < N) rep(TRUE,.N) else 1:.N %in%  sample(.N,N) }, by=C1]
dt
# C1 C2    C3
# 1:  A  2  TRUE
# 2:  A  1 FALSE
# 3:  A  3  TRUE
# 4:  B  1  TRUE
# 5:  C  2 FALSE
# 6:  C  3  TRUE
# 7:  C  4  TRUE
# 8:  D  5  TRUE
# 9:  D  4  TRUE
# 10:  D  5 FALSE

Flag randomly selected N rows by group in data.table

Question

2 answers

solution1
1 ACCPTED 2018-05-11 14:33:14

solution2
1 2018-05-11 14:37:19

Flag randomly selected N rows by group in data.table

Question

2 answers

solution1 1 ACCPTED 2018-05-11 14:33:14

solution2 1 2018-05-11 14:37:19

solution1
1 ACCPTED 2018-05-11 14:33:14

solution2
1 2018-05-11 14:37:19