简体   繁体   中英

Shifting the last non-NA value by id

I have a data table that looks like this:

DT<-data.table(day=c(1,2,3,4,5,6,7,8),Consumption=c(5,9,10,2,NA,NA,NA,NA),id=c(1,2,3,1,1,2,2,1))

   day Consumption id
1:   1           5  1
2:   2           9  2
3:   3          10  3
4:   4           2  1
5:   5          NA  1
6:   6          NA  2
7:   7          NA  2
8:   8          NA  1

I want to create two columns that show the last non-Na consumption value before the observation, and the day difference between those observations using the id groups. So far, I tried this:

DT[, j := day-shift(day, fill = NA,n=1), by = id]
DT[, yj := shift(Consumption, fill = NA,n=1), by = id]

   day Consumption id  j yj
1:   1           5  1 NA NA
2:   2           9  2 NA NA
3:   3          10  3 NA NA
4:   4           2  1  3  5
5:   5          NA  1  1  2
6:   6          NA  2  4  9
7:   7          NA  2  1 NA
8:   8          NA  1  3 NA 

However, I want that the lagged consumption values with n=1 come from the rows which have non-NA consumption values. For example, in the 7th row and column "yj", the yj value is NA because it comes from the 6th row which has NA consumption. I want it to come from the 2nd row. Therefore, I would like the end up with this data table:

   day Consumption id  j yj
1:   1           5  1 NA NA
2:   2           9  2 NA NA
3:   3          10  3 NA NA
4:   4           2  1  3  5
5:   5          NA  1  1  2
6:   6          NA  2  4  9
7:   7          NA  2  5  9
8:   8          NA  1  4  2

Note: The reason for specifically using the parameter n of shift function is that I will also need the 2nd last non-Na consumption values in the next step.

Thank You

Here's a solution with an assist from :

library(data.table)
library(zoo)

DT[, `:=`(day_shift = shift(day),
          yj = shift(Consumption)),
   by = id]

#make the NA yj records NA for the days
DT[is.na(yj), day_shift := NA_integer_]

#fill the DT with the last non-NA value
DT[,
   `:=`(day_shift = na.locf(day_shift, na.rm = F),
          yj = zoo::na.locf(yj, na.rm = F)),
   by = id]

# finally calculate j
DT[, j:= day - day_shift]

# you can clean up the ordering or remove columns later
DT

   day Consumption id day_shift yj  j
1:   1           5  1        NA NA NA
2:   2           9  2        NA NA NA
3:   3          10  3        NA NA NA
4:   4           2  1         1  5  3
5:   5          NA  1         4  2  1
6:   6          NA  2         2  9  4
7:   7          NA  2         2  9  5
8:   8          NA  1         4  2  4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM