With data similar to this:
dt <- data.table(id = c("a","a","b","b","b","c","c","c","c","d","d","d","d","d"),
quantity = c(6,6,7,7,7,8,8,1,1,9,9,9,2,2))
threshold <- 3
id quantity
1: a 6
2: a 6
3: b 7
4: b 7
5: b 7
6: c 8
7: c 8
8: c 1
9: c 1
10: d 9
11: d 9
12: d 9
13: d 2
14: d 2
I would like to subset in two ways:
First subset, all id
s are kept where quantity
has the same observation at least threshold
times (3 times) for each id
. The output should look like this:
id quantity
1: b 7
2: b 7
3: b 7
4: d 9
5: d 9
6: d 9
7: d 2
8: d 2
Second subset, only rows are kept where quantity
has the same observation at least threshold
times (3 times) for each id
. The output should look like this:
id quantity
1: b 7
2: b 7
3: b 7
4: d 9
5: d 9
6: d 9
Thanks so much.
# normally I'd use .SD, not .I, but you don't have anything else in your table
second = dt[, if (.N >= threshold) .I, by = .(id, quantity)][, -"V1"]
first = dt[unique(second$id), on = 'id']
For the first subset, you could do:
dt[id %in% dt[, .N, by = .(id, quantity)][N >= threshold, unique(id)]]
which gives:
id quantity 1: b 7 2: b 7 3: b 7 4: d 9 5: d 9 6: d 9 7: d 2 8: d 2
And for the second subset:
dt[dt[, .N, by = .(id, quantity)][N >= threshold, .(id, quantity)]
, on = .(id, quantity)]
which gives:
id quantity 1: b 7 2: b 7 3: b 7 4: d 9 5: d 9 6: d 9
Playing with base::rle()
:
First subset:
dt[, .SD[max(rle(quantity)[["lengths"]]) >= threshold], id]
id quantity
1: b 7
2: b 7
3: b 7
4: d 9
5: d 9
6: d 9
7: d 2
8: d 2
Second subset:
dt[,{
tmp <- rle(quantity)
ind <- tmp[["lengths"]] >= threshold
rep(tmp[["values"]][ind], tmp[["lengths"]][ind])
},
by = id]
id V1
1: b 7
2: b 7
3: b 7
4: d 9
5: d 9
6: d 9
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.