R bootstrap 加权平均值按组与数据表

Question

我正在尝试结合两种方法：

以可扩展的方式引导 data.table 中的多列

和

R中的引导加权平均值

这是一些随机数据：

## Generate sample data

# Function to randomly generate weights
set.seed(7)
rtnorm <- function(n, mean, sd, a = -Inf, b = Inf){
qnorm(runif(n, pnorm(a, mean, sd), pnorm(b, mean, sd)), mean, sd)
}

# Generate variables
nps    <- round(runif(3500, min=-1, max=1), 0) # nps value which takes 1, 0 or -1
group  <- sample(letters[1:11], 3500, TRUE) # groups
weight <- rtnorm(n=3500, mean=1, sd=1, a=0.04, b=16) # weights between 0.04 and 16

# Build data frame
df = data.frame(group, nps, weight)

# The following packages / libraries are required:
require("data.table")
require("boot")

这是上面第一篇文章中的代码，用于增强加权平均值：

samplewmean <- function(d, i, j) {
  d <- d[i, ]
  w <- j[i, ]
  return(weighted.mean(d, w))   
}

results_qsec <- boot(data= df[, 2, drop = FALSE], 
                     statistic = samplewmean, 
                     R=10000, 
                     j = df[, 3 , drop = FALSE])

这完全正常。

下面是上面第二篇文章中的代码，通过数据表中的组引导平均值：

dt = data.table(df)
stat <- function(x, i) {x[i, (m=mean(nps))]}
dt[, list(list(boot(.SD, stat, R = 100))), by = group]$V1

这也很好用。

我在结合这两种方法时遇到了麻烦：

跑步 …

dt[, list(list(boot(.SD, samplewmean, R = 5000, j = dt[, 3 , drop = FALSE]))), by = group]$V1

... 显示错误消息：

Error in weighted.mean.default(d, w) : 
  'x' and 'w' must have the same length

跑步 …

dt[, list(list(boot(dt[, 2 , drop = FALSE], samplewmean, R = 5000, j = dt[, 3 , drop = FALSE]))), by = group]$V1

… 带来了一个不同的错误：

Error in weighted.mean.default(d, w) : 
  (list) object cannot be coerced to type 'double'

我仍然无法理解 data.table 中的参数以及如何组合运行 data.table 的函数。

我将不胜感激任何帮助。

Answer 1

它与data.table在函数范围内的行为方式有关。 d 仍然是data.table中的samplewmean即使在用i进行子集化之后，而weighted.mean期望权重和值的数值向量。 如果您在调用weighted.mean之前unlist ，您将能够修复此错误

weighted.mean.default(d, w) 错误：(list) 对象不能被强制输入“double”

代码不公开突入前weighted.mean ：

samplewmean <- function(d, i, j) {
  d <- d[i, ]
  w <- j[i, ]
  return(weighted.mean(unlist(d), unlist(w)))   
}

dt[, list(list(boot(dt[, 2 , drop = FALSE], samplewmean, R = 5000, j = dt[, 3 , drop = FALSE]))), by = group]$V1

更像data.table (data.table version >= v1.10.2) 的语法大概如下：

#a variable named original is being passed in from somewhere and i am unable to figure out from where
samplewmean <- function(d, valCol, wgtCol, original) {
    weighted.mean(unlist(d[, ..valCol]), unlist(d[, ..wgtCol]))
}

dt[, list(list(boot(.SD, statistic=samplewmean, R=1, valCol="nps", wgtCol="weight"))), by=group]$V1

或者另一种可能的语法是：（参见data.table faq 1.6 ）

samplewmean <- function(d, valCol, wgtCol, original) {
    weighted.mean(unlist(d[, eval(substitute(valCol))]), unlist(d[, eval(substitute(wgtCol))]))
}

dt[, list(list(boot(.SD, statistic=samplewmean, R=1, valCol=nps, wgtCol=weight))), by=group]$V1

R bootstrap 加权平均值按组与数据表

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-02-21 01:24:51

R bootstrap 加权平均值按组与数据表

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-02-21 01:24:51

解决方案1
2 已采纳 2018-02-21 01:24:51