简体   繁体   English

如何在data.table中添加延迟并导致每个观察结果中的更多变量排除NA?

[英]How to add lag and lead to each observations for more variables excluding NAs within data.table?

I have a data.table similar to this: 我有一个类似这样的data.table:

library(data.table)
mydt <- data.table(id = LETTERS[1:6], x = 1:6, y = 2:3) 
> mydt
   id x y
1:  A 1 2
2:  B 2 3
3:  C 3 2
4:  D 4 3
5:  E 5 2
6:  F 6 3

I would like to replace the value columns with adding the lag and lead to each observation (ie x[-1] + x + x[1] ). 我想替换值列,添加滞后并导致每个观察(即x[-1] + x + x[1] )。 I can do something like this with the amazing shift() feature. 我可以使用惊人的shift()功能做这样的事情。

cols <- c('x', 'y')
mydt[
    ,
    (cols) := shift(.SD, 1) + .SD + shift(.SD, 1, type = 'lead'),
    .SDcols = cols
][]
   id  x  y
1:  A NA NA
2:  B  6  7
3:  C  9  8
4:  D 12  7
5:  E 15  8
6:  F NA NA

But this introduces NAs for rows where there is no lead/lag value. 但是这会为没有超前/滞后值的行引入NA。 How can I modify the calculation to use the available two values only for these rows (like na.rm = TRUE )? 如何修改计算以仅对这些行使用可用的两个值(如na.rm = TRUE )? So that the output would be 这样输出就可以了

   id  x  y
1:  A  3  5
2:  B  6  7
3:  C  9  8
4:  D 12  7
5:  E 15  8
6:  F 11  5

I tried using sum(..., na.rm = TRUE) instead of the + operator but that gives error: Error in sum(shift(.SD, 1), .SD, shift(.SD, 1, type = "lead"), na.rm = TRUE) : invalid 'type' (list) of argument . 我尝试使用sum(..., na.rm = TRUE)而不是+运算符,但这给出了错误: Error in sum(shift(.SD, 1), .SD, shift(.SD, 1, type = "lead"), na.rm = TRUE) : invalid 'type' (list) of argument

I also tried the following but that apparently gives something else as a result. 我也试过以下但是显然会给出其他的东西。

mydt[
    ,
    (cols) := lapply(
        .SD, 
        function(x) sum(shift(x, 1), x, shift(x, 1, type = 'lead'), na.rm = TRUE)
    ),
    .SDcols = cols
][]
   id   x  y
1:  A 126 90
2:  B 126 90
3:  C 126 90
4:  D 126 90
5:  E 126 90
6:  F 126 90

As @akrun and @DavidArenburg pointed out, the shift function has a fill parameter which solves the issue. 正如@akrun和@DavidArenburg指出的那样, shift函数有一个fill参数来解决问题。

cols <- c('total_open', 'total_send')
mydt[
    ,
    (cols) := shift(.SD, 1, fill = 0) + .SD + shift(.SD, 1, type = 'lead', fill = 0),
    .SDcols = cols
][]
   id  x y
1:  A  3 5
2:  B  6 7
3:  C  9 8
4:  D 12 7
5:  E 15 8
6:  F 11 5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM