[英]How to calculate deviations from weighted mean in data.table?
I would like to calculate deviations from (weighted) mean for many variables in a data.table
. 我想计算data.table
许多变量与(加权)均值的data.table
。
Let's take this example set: 我们来看这个例子:
mydt <- data.table(
id = c(1, 2, 2, 3, 3, 3),
x = 1:6,
y = 6:1,
w = rep(1:2, 3)
)
mydt
id x y w
1: 1 1 6 1
2: 2 2 5 2
3: 2 3 4 1
4: 3 4 3 2
5: 3 5 2 1
6: 3 6 1 2
I can calculate the weighted means of x
and y
as follows: 我可以计算x
和y
的加权平均值如下:
mydt[
,
lapply(
as.list(.SD)[c("x", "y")],
weighted.mean, w = w
),
by = id
]
(I use the relatively complicated as.list(.SD)[...]
construct instead of .SDcols
because of this bug.) (因为这个 bug,我使用相对复杂的as.list(.SD)[...]
构造而不是.SDcols
。)
I tried to first create the means for each row, but did not find how to combine :=
with lapply()
. 我试图首先为每一行创建方法,但没有找到如何组合:=
与lapply()
。
Just tweak the weighted mean calculation a bit: 只需稍微调整加权平均值计算:
mydt[
,
lapply(
.SD[, .(x, y)],
function(var) var - weighted.mean(var, w = w)
),
by = id
]
id x y
1: 1 0.0000 0.0000
2: 2 -0.3333 0.3333
3: 2 0.6667 -0.6667
4: 3 -1.0000 1.0000
5: 3 0.0000 0.0000
6: 3 1.0000 -1.0000
The solution is updated by the suggested notational simplification of @DavidArenburg. 该解决方案由@DavidArenburg建议的符号简化更新。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.