I have a data table that looks something like the following. Note that the flag is 1
when vals
is 0
and missing elsewhere.
dt <- data.table(vals = c(0,2,4,1,0,4,3,0,3,4))
dt[vals == 0, flag := 1]
> dt
vals flag
1: 0 1
2: 2 NA
3: 4 NA
4: 1 NA
5: 0 1
6: 4 NA
7: 3 NA
8: 0 1
9: 3 NA
10: 4 NA
I would like the output to look like the seq
column below. That is, the column needs to contain a set of sequences beginning at 1 whenever vals
is 0
and counting up until the next row when vals
is 0
. The flag
is only helpful if it helps attain the goal described.
> dt
vals seq
1: 0 1
2: 2 2
3: 4 3
4: 1 4
5: 0 1
6: 4 2
7: 3 3
8: 0 1
9: 3 3
10: 4 3
Originally, I was thinking about using cumsum()
somehow, but I can't figure out how to use it effectively.
My current solution is pretty ugly.
dt <- data.table(vals = c(0,2,4,1,0,4,3,0,3,4))
dt[vals == 0, flag := 1]
dt[, flag_rleid := rleid(flag)]
# group on the flag_rleid column
dt[, flag_seq := seq_len(.N), by = flag_rleid]
# hideous subsetting to avoid incrementing the first appearance of a 1
dt[vals != 0, flag_seq := flag_seq + 1]
# flag_seq is the desired column
> dt
vals flag flag_rleid flag_seq
1: 0 1 1 1
2: 2 NA 2 2
3: 4 NA 2 3
4: 1 NA 2 4
5: 0 1 3 1
6: 4 NA 4 2
7: 3 NA 4 3
8: 0 1 5 1
9: 3 NA 6 2
10: 4 NA 6 3
Any improvements are appreciated.
We can use a logical index with cumsum
to create the grouping variable and then based on that we get the sequence colum
dt[, flag_seq := seq_len(.N), cumsum(vals ==0)]
dt
# vals flag flag_seq
# 1: 0 1 1
# 2: 2 NA 2
# 3: 4 NA 3
# 4: 1 NA 4
# 5: 0 1 1
# 6: 4 NA 2
# 7: 3 NA 3
# 8: 0 1 1
# 9: 3 NA 2
#10: 4 NA 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.