简体   繁体   中英

Filling NA values with a sequence in R data.table

I have a data table that looks something like the following. Note that the flag is 1 when vals is 0 and missing elsewhere.

dt <- data.table(vals = c(0,2,4,1,0,4,3,0,3,4))
dt[vals == 0, flag := 1]

 > dt
    vals flag
 1:    0    1
 2:    2   NA
 3:    4   NA
 4:    1   NA
 5:    0    1
 6:    4   NA
 7:    3   NA
 8:    0    1
 9:    3   NA
10:    4   NA

I would like the output to look like the seq column below. That is, the column needs to contain a set of sequences beginning at 1 whenever vals is 0 and counting up until the next row when vals is 0 . The flag is only helpful if it helps attain the goal described.

 > dt
    vals  seq
 1:    0    1
 2:    2    2
 3:    4    3
 4:    1    4
 5:    0    1
 6:    4    2
 7:    3    3
 8:    0    1
 9:    3    3
10:    4    3

Originally, I was thinking about using cumsum() somehow, but I can't figure out how to use it effectively.

My current solution is pretty ugly.

dt <- data.table(vals = c(0,2,4,1,0,4,3,0,3,4))
dt[vals == 0, flag := 1]
dt[, flag_rleid := rleid(flag)]

# group on the flag_rleid column
dt[, flag_seq := seq_len(.N), by = flag_rleid]
# hideous subsetting to avoid incrementing the first appearance of a 1
dt[vals != 0, flag_seq := flag_seq + 1]

# flag_seq is the desired column
> dt
    vals flag flag_rleid flag_seq
 1:    0    1          1        1
 2:    2   NA          2        2
 3:    4   NA          2        3
 4:    1   NA          2        4
 5:    0    1          3        1
 6:    4   NA          4        2
 7:    3   NA          4        3
 8:    0    1          5        1
 9:    3   NA          6        2
10:    4   NA          6        3

Any improvements are appreciated.

We can use a logical index with cumsum to create the grouping variable and then based on that we get the sequence colum

dt[, flag_seq := seq_len(.N), cumsum(vals ==0)]
dt
#    vals flag flag_seq
# 1:    0    1        1
# 2:    2   NA        2
# 3:    4   NA        3
# 4:    1   NA        4
# 5:    0    1        1
# 6:    4   NA        2
# 7:    3   NA        3
# 8:    0    1        1
# 9:    3   NA        2
#10:    4   NA        3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM