R：如何使用带有样本权重的 describe()

Question

I have a datafile with raw scores and with sample weights.我有一个包含原始分数和样本权重的数据文件。 Now I want to use the describe function of the psych package, taking into account the sample weights.现在我想使用 psych 包的 describe 函数，同时考虑样本权重。

Does anyone know how to do that, or is there a function somewhere that does exactly te same as psych::describe() but can handle sample weights?有谁知道如何做到这一点，或者是否有与 psych::describe() 完全相同但可以处理样本权重的函数？

The next example will give some insight in what I intend to do.下一个示例将深入了解我打算做什么。

library(psych)
describe(c(2,3,4,1,4,5,3,3))
#gives:
     vars n mean   sd median trimmed  mad min max range skew kurtosis   se
1    1 8 3.12 1.25      3    3.12 1.48   1   5     4 -0.2    -1.16 0.44

The sample weights are:样本权重为：

c(0.2,0.5,1.2,1.5,0.2,0.6,0.6,1.1)

The weighted mean would be (correct me if I am wrong):加权平均值将是（如果我错了，请纠正我）：

sum(c(2,3,4,1,4,5,3,3)* c(0.2,0.5,1.2,1.5,0.2,0.6,0.6,1.1))/sum(c(0.2,0.5,1.2,1.5,0.2,0.6,0.6,1.1))
[1] 2.898305

So that's, ofcourse different from the unweighted mean.所以这当然不同于未加权的平均值。 How can I make sure that the reported SD, kurtosis, skewness etc. are based on the sample weighted mean as well?如何确保报告的 SD、峰态、偏度等也基于样本加权平均值？

Answer 1

As the psych package does not handle weights, and there is no alternative package that serves an equivalent collection of weighted descriptives, one has to cherry pick from different packages and combine the output like psych::describe() does.由于 psych 包不处理权重，并且没有替代的包可以提供等效的加权描述集合，因此必须从不同的包中挑选并像psych::describe()那样组合输出。

Also, the calculation of weighted descriptives typically need to be supplied with each case in the data along with the individual weights assigned those cases, therefore "shortcuts" won't work.此外，加权描述的计算通常需要与数据中的每个案例以及分配给这些案例的各个权重一起提供，因此“快捷方式”将不起作用。 (For example, the Weighted Standard Error will not be equal to the Weighted Standard Deviation divided by the square root of the sample size .) （例如，加权标准误差将不等于加权标准偏差除以样本数量的平方根。）

Here's a simple wrapper function that mimics the behavior of psych::describe() for weighted data:这是一个简单的包装函数，它模仿了psych::describe()对加权数据的行为：

    wtd.describe <- function(x, weights=NULL, trim=.1){
      require(TAM)
      require(diagis)
      require(robsurvey)
      out <- NULL
      # Handling simple vectors
      x <- as.data.frame(x)
      # If no weights given, all weights = 1
      if(is.null(weights)) {weights <- seq(1, nrow(x))}
      i <- 1
      for(colname in colnames(x)){
        # Removing rows with missing data or weight
        d <- x[complete.cases(x[[colname]], weights), , drop=FALSE][[colname]]
        w <- weights[complete.cases(x[[colname]], weights)]
        wd <- data.frame(
          "vars"     = i,
          "n"        = length(d),
          "mean"     = TAM::weighted_mean(d, w = w),
          "sd"       = TAM::weighted_sd(d, w = w),
          "median"   = robsurvey::weighted_median(d, w = w, na.rm = TRUE),
          "trimmed"  = robsurvey::weighted_mean_trimmed(d, w = w, LB = trim, UB = (1 - trim), na.rm = TRUE),  
          "mad"      = robsurvey::weighted_mad(d, w = w, na.rm = TRUE, constant = 1.4826),
          "min"      = min(d),
          "max"      = max(d),
          "range"    = max(d) - min(d),
          "skew"     = TAM::weighted_skewness(d, w = w),
          "kurtosis" = TAM::weighted_kurtosis(d, w = w),
          "se"       = diagis::weighted_se(d, w = w, na.rm = TRUE),
          row.names  = colname
        )
        i <- i+1
        out <- rbind(out, wd)
      }
      return(out)
    }

Please note that:请注意：

I have not taken into account the quality and maintenance status of the packages used.我没有考虑使用的包的质量和维护状态。 Feel free to pick your own and swap them in.随意挑选你自己的并交换它们。
Most of the convenience parameters of psych:describe() are not emulated by the above function. psych:describe()大部分便利参数都没有被上述函数模拟。
na.rm = TRUE is implied, as the TAM package does implicit na.rm = TRUE . na.rm = TRUE是隐含的，因为 TAM 包确实隐含na.rm = TRUE 。

R：如何使用带有样本权重的 describe()

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-06-08 21:39:46

R：如何使用带有样本权重的 describe()

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-06-08 21:39:46

解决方案1
2 已采纳 2020-06-08 21:39:46