简体   繁体   中英

R: How to use describe() with sample weights

I have a datafile with raw scores and with sample weights. Now I want to use the describe function of the psych package, taking into account the sample weights.

Does anyone know how to do that, or is there a function somewhere that does exactly te same as psych::describe() but can handle sample weights?

The next example will give some insight in what I intend to do.

library(psych)
describe(c(2,3,4,1,4,5,3,3))
#gives:
     vars n mean   sd median trimmed  mad min max range skew kurtosis   se
1    1 8 3.12 1.25      3    3.12 1.48   1   5     4 -0.2    -1.16 0.44

The sample weights are:

c(0.2,0.5,1.2,1.5,0.2,0.6,0.6,1.1)

The weighted mean would be (correct me if I am wrong):

sum(c(2,3,4,1,4,5,3,3)* c(0.2,0.5,1.2,1.5,0.2,0.6,0.6,1.1))/sum(c(0.2,0.5,1.2,1.5,0.2,0.6,0.6,1.1))
[1] 2.898305

So that's, ofcourse different from the unweighted mean. How can I make sure that the reported SD, kurtosis, skewness etc. are based on the sample weighted mean as well?

As the psych package does not handle weights, and there is no alternative package that serves an equivalent collection of weighted descriptives, one has to cherry pick from different packages and combine the output like psych::describe() does.

Also, the calculation of weighted descriptives typically need to be supplied with each case in the data along with the individual weights assigned those cases, therefore "shortcuts" won't work. (For example, the Weighted Standard Error will not be equal to the Weighted Standard Deviation divided by the square root of the sample size .)

Here's a simple wrapper function that mimics the behavior of psych::describe() for weighted data:

    wtd.describe <- function(x, weights=NULL, trim=.1){
      require(TAM)
      require(diagis)
      require(robsurvey)
      out <- NULL
      # Handling simple vectors
      x <- as.data.frame(x)
      # If no weights given, all weights = 1
      if(is.null(weights)) {weights <- seq(1, nrow(x))}
      i <- 1
      for(colname in colnames(x)){
        # Removing rows with missing data or weight
        d <- x[complete.cases(x[[colname]], weights), , drop=FALSE][[colname]]
        w <- weights[complete.cases(x[[colname]], weights)]
        wd <- data.frame(
          "vars"     = i,
          "n"        = length(d),
          "mean"     = TAM::weighted_mean(d, w = w),
          "sd"       = TAM::weighted_sd(d, w = w),
          "median"   = robsurvey::weighted_median(d, w = w, na.rm = TRUE),
          "trimmed"  = robsurvey::weighted_mean_trimmed(d, w = w, LB = trim, UB = (1 - trim), na.rm = TRUE),  
          "mad"      = robsurvey::weighted_mad(d, w = w, na.rm = TRUE, constant = 1.4826),
          "min"      = min(d),
          "max"      = max(d),
          "range"    = max(d) - min(d),
          "skew"     = TAM::weighted_skewness(d, w = w),
          "kurtosis" = TAM::weighted_kurtosis(d, w = w),
          "se"       = diagis::weighted_se(d, w = w, na.rm = TRUE),
          row.names  = colname
        )
        i <- i+1
        out <- rbind(out, wd)
      }
      return(out)
    }

Please note that:

  • I have not taken into account the quality and maintenance status of the packages used. Feel free to pick your own and swap them in.
  • Most of the convenience parameters of psych:describe() are not emulated by the above function.
  • na.rm = TRUE is implied, as the TAM package does implicit na.rm = TRUE .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM