简体   繁体   English

使用R中的data.table计算加权平均值,其中一个表列中的权重

[英]Calculating a weighted mean using data.table in R with weights in one of the table columns

I have a data.table shown below. 我有一个data.table如下所示。 I'm trying to calculate the weighted mean for subsets of the data. 我正在尝试计算数据子集的加权平均值。 I've tried two approaches with the MWE below 我尝试了以下MWE的两种方法

    set.seed(12345)
    dt = data.table(a =c(10,20,25,10,10),b=rnorm(5),c=rnorm(5),d=rnorm(5),e=rnorm(5))
    dt$key = sample(toupper(letters[1:3]),5,replace=T)
    setkey(dt, key)

First subsetting the .SD and using an lapply call, which doesnt work (and wasn't really expected to) 首先对.SD进行子集化并使用lapply调用,这不起作用(并且实际上并不是这样)

dt[,lapply(.SD,function(x) weighted.mean(x,.SD[1])),by=key]

Second trying to define a function to apply to the .SD as I would if I were using ddply. 其次尝试定义一个函数来应用于.SD,就像我使用ddply一样。

This fails too. 这也失败了。

wmn=function(x){
  tmp = NULL
  for(i in 2:ncol(x)){
    tmp1 = weighted.mean(x[,i],x[,1])
    tmp = c(tmp,tmp1)
  }
  return(tmp)
}

dt[,wmn,by=key]

Any thoughts on how best to do this? 有关如何最好地做到这一点的任何想法?

Thanks 谢谢

EDIT 编辑

Change to error on wmn formula on columns selected. 更改为所选列上的wmn公式的错误。

SECOND EDIT 第二次编辑

Weighted Mean formula reversed back and added set.seed 加权平均公式反转并添加了set.seed

如果你想采用“b”的加权方式......“e”使用“a”作为权重,我认为这样做的诀窍:

dt[,lapply(.SD,weighted.mean,w=a),by=key,.SDcols=letters[1:5]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM