[英]Avoiding NA in rolling sums of last n observations within by groups using data.table
According to this threat I learned, rolling sums for variable b in the following data.table can be achieved as follows: 根据我了解到的这种威胁,可以如下实现data.table中变量b的滚动总和:
data creation + computing rolling sums: 数据创建+计算总和:
x <- data.table(a = sample(letters[1:3], 100, replace = TRUE), b = runif(100))
setorder(x, a)
# alternative 1
x[, .(b, Reduce(`+`, shift(b, 0:2))), by = a]
# alternative 2
x[, .(b, stats::filter(b, rep(1, 3), sides = 1)), by = a]
Current + desired output: 电流+所需输出:
a b V2 V2_desired
1: a 0.457665568 NA 0.457665568
2: a 0.752555834 NA 1.210221
3: a 0.864672124 2.0748935 2.0748935
4: a 0.542168656 2.1593966 2.1593966
5: a 0.197962875 1.6048037 1.6048037
Now there are NAs generated for the first two obs. 现在,为前两个对象生成了NA。 in each by group. 在每个组中。 I need to adjust one of the alternatives to sum only the current obs. 我需要调整备选方案之一以仅对当前obs求和。 (last two obs.) in cases where the group index starts (is at position 2). (最后两个观察点)在组索引开始的情况下(位于位置2)。 This should be generalizable such that I could consider windows of last n values and the exceptions are handled. 这应该是可概括的,以便我可以考虑最后n个值的窗口并处理异常。
Any idea? 任何想法?
I'm not 100% sure I'm getting what you need, but the shift
function leaves behind NA values by default. 我不确定100%是否能满足您的需求,但是默认情况下shift
函数会保留NA值。 You can change that behaviour by passing a fill
argument. 您可以通过传递fill
参数来更改该行为。 In your case, since you're summing the data, you might want to try it with fill=0
: 在您的情况下,由于要对数据求和,因此您可能需要使用fill=0
进行尝试:
set.seed( 123 )
x[, .(b, Reduce(`+`, shift(b, 0:2, fill=0))), by = a]
head
returns: head
回报:
a b V2
1: a 0.5999890 0.599989
2: a 0.8903502 1.490339
3: a 0.7205963 2.210935
4: a 0.5492847 2.160231
5: a 0.9540912 2.223972
6: a 0.5854834 2.088859
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.