简体   繁体   English

在 dcast.data.table 中使用 weighted.mean 时出错

[英]Error using weighted.mean in dcast.data.table

I am experimenting with dcast.data.table for weighted.mean .我正在试验dcast.data.tableweighted.mean However it throws an error for this function.但是,它会为此函数引发错误。

library(data.table)
dat = data.table(
  x = c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3), 
  y = c(4,4,4,4,4,4,5,5,5,5,5,5,6,6,6,6,6,6), 
  z = c(7:24), 
  w = c(0.1, 0.1, 0.1, 0.9, 0.9, 0.9, 0.2, 0.2, 0.2, 0.8, 0.8, 0.8, 0.3, 0.3, 0.3, 0.7, 0.7, 0.7)
  )
dcast.data.table(
  dat,
  x~y,
  fun.aggregate = weighted.mean, w = 'w',
  value.var= 'z'
)

# Error in weighted.mean.default(z, w = "w") : 
#   'x' and 'w' must have the same length

There are workarounds that suggest to use either dplyr or data.table[] but none explain why dcast doesn't work.有一些解决方法建议使用dplyrdata.table[]但没有人解释为什么dcast不起作用。

As @Frank points out, the fun.aggregate argument of dcast can only take functions whose output is a single value.正如@Frank指出, fun.aggregate的参数dcast只能采取其输出是一个单一的值的函数。 However, I don't think that this is the issue with weighted.mean .但是,我不认为这是weighted.mean的问题。 If I don't specify weights it get valid answer如果我不指定权重,它会得到有效的答案

dcast.data.table(
  dat,
  x~y,
  fun.aggregate = weighted.mean, 
  value.var= 'z'
  # ,w = 'w'
)

This is also demonstrated with quantile function which gives me a valid answer when the result for each function is a single value (ie by specifying single value for probs )这也通过quantile函数得到了证明,当每个函数的结果是单个值时(即通过为probs指定单个值),它给了我一个有效的答案

dcast.data.table(
  dat,
  x~y,
  fun.aggregate = quantile, 
  value.var= 'z',
  probs = c(0.25)
)

However when it is written to output a vector for each combination, I get an error commensurate with the limitation of fun.aggregate but different from the error I get with using weighted.mean然而,当它被写入为每个组合输出一个向量时,我得到一个与fun.aggregate的限制相称的错误,但与我使用weighted.mean得到的错误不同

dcast.data.table(
  dt,
  x~y,
  fun.aggregate = quantile, 
  value.var= 'z',
  probs = c(0.25,0.75)
)
# Error: Aggregating function(s) should take vector inputs and return a single value (length=1). However, function(s) returns length!=1. This value will have to be used to fill any missing combinations, and therefore must be length=1. Either override by setting the 'fill' argument explicitly or modify your function to handle this case appropriately.

It seems that dcast doesn't split up the w argument for each function and passes the entire vector to weighted.mean function.似乎dcast没有拆分每个函数的w参数, dcast将整个向量传递给weighted.mean函数。 I want to understand what internally prevents this function from doing this.我想了解内部阻止此功能执行此操作的原因。

Wath about this?这个呢?

dat = data.frame(x = c(1,1,2,2),
y = c(4,4,5,5),
z = c(1,2,3,4),
w = c(1,2,1,2))

weighted.sum
reshape2::dcast(data =  dat, formula=x~y, 
fun.aggregate = function(x){mean(x*dat$w)*length(x)},
value.var= c('z'))

#weighted.mean
reshape2::dcast(data =  dat, formula=x~y, 
fun.aggregate = function(x){mean(x*dat$w)}, 
value.var= c('z'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM