At the data.table in column C3 I want to flag N randomly selected rows by each group (C1). There are several similar questions have already been asked on SO here , here and here . But based on the answers still cannot figure out how to find a solution for my task.
set.seed(1)
dt = data.table(C1 = c("A","A","A","B","C","C","C","D","D","D"),
C2 = c(2,1,3,1,2,3,4,5,4,5))
dt
C1 C2
1: A 2
2: A 1
3: A 3
4: B 1
5: C 2
6: C 3
7: C 4
8: D 5
9: D 4
10: D 5
Here are row indexes for two randomly selected rows by each group C1 (doesn't work well for group B):
dt[, sample(.I, min(.N, 2)), by = C1]$V1
[1] 1 3 3 7 5 10 9
NB: for B only one row should be selected because group B consists of one row only.
Here is a solution for one randomly selected row in each group, which often doesn't work for group B:
dt[, C3 := .I == sample(.I, 1), by = C1]
dt
C1 C2 C3
1: A 2 FALSE
2: A 1 TRUE
3: A 3 FALSE
4: B 1 FALSE
5: C 2 TRUE
6: C 3 FALSE
7: C 4 FALSE
8: D 5 TRUE
9: D 4 FALSE
10: D 5 FALSE
Actually I want to expand it on N rows. I've tried (for two rows):
dt[, C3 := .I==sample(.I, min(.N, 2)), by = C1]
which of course doesn't work.
Any help is much appreciated!
dt[, C3 := 1:.N %in% sample(.N, min(.N, 2)), by = C1]
Or use head
, but I think that should be slower
dt[, C3 := 1:.N %in% head(sample(.N), 2) , by = C1]
If the number of flagged rows is not constant you can do
flagsz <- c(2, 1, 2, 3)
dt[, C3 := 1:.N %in% sample(.N, min(.N, flagsz[.GRP])), by = C1]
N=2
dt[, C3 := {if (.N < N) rep(TRUE,.N) else 1:.N %in% sample(.N,N) }, by=C1]
dt
# C1 C2 C3
# 1: A 2 TRUE
# 2: A 1 FALSE
# 3: A 3 TRUE
# 4: B 1 TRUE
# 5: C 2 FALSE
# 6: C 3 TRUE
# 7: C 4 TRUE
# 8: D 5 TRUE
# 9: D 4 TRUE
# 10: D 5 FALSE
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.