简体   繁体   中英

weighted median in spatstat package

The weighted.median() function in the spatstat package returns "10.5", when I pass the evenly weighted scores of 10, 11, & 12. I was expecting the response of "11" (which is the output of stats::median() and matrixStats::weightedMedian() ).

The concept of a weighted median isn't very natural to me. Is the output incorrect, or am I misunderstand the purpose of the function?

x <- c(10, 11, 12)
w <- c( 1,  1,  1)

spatstat::weighted.median(x, w)
#> [1] 10.5
spatstat::weighted.quantile(x, w, probs = .5)
#>  50% 
#> 10.5


matrixStats::weightedMedian(x, w)
#> [1] 11
median(x)
#> [1] 11

Created on 2020-02-23 by the reprex package (v0.3.0)

There is a more fundamental issue here about the definition of a quantile (including the median) in small finite samples.

The help file for the R base function quantile.default says that there is an argument type , with 7 different options, which will give different answers. These are carefully described in a fine article by Rob Hyndman, cited in the help file. The default for quantile.default is type=7 .

The algorithm in spatstat::weighted.quantile performs the analogue of type=4 (according to its help file); that is, the cumulative distribution function F(x) is linearly interpolated and then the inverse function is computed. This algorithm is correctly implemented in the spatstat code.

The weighted median in the other package you mentioned is computing a different definition of the weighted median.

Thank you very much for drawing our attention to this example. This may prompt us to extend the implementation of spatstat::weighted.median to embrace the other types.

Incidentally, bug reports for a CRAN package should be posted on the package's bug reports page as shown on CRAN. It's just lucky that I saw this post. But thank you very much, both of you, for spotting this issue.

I believe this is a flaw in the package, and I'll explain why.

Firstly, weighted.median actually just calls weighted.quantile with the probs vector set to 0.5 . But if you call weighted.quantile with your data, you get very strange results:

weighted.quantile(x, w)
#>    0%   25%   50%   75%  100% 
#> 10.00 10.00 10.50 11.25 12.00 

That's not right.

If you look at the body of this function using body(weighted.quantile) , and follow the logic through, there seems to be a problem with the way the weights are normalized on line 10 into a variable called Fx . To work properly, the normalized weights should be a vector of the same length as x , but starting at 0 and ending in 1, with the spacing in between being proportional to the weights.

But if you look at how this is actually calculated:

body(weighted.quantile)[[10]]
#> Fx <- cumsum(w)/sum(w)

You can see it doesn't start at 0. In your case, the first element would be 0.3333.

So to show this is the case, let's write over this line with the correct expression. (First we need to unlock the binding to give access to the function)

unlockBinding("weighted.quantile", asNamespace("spatstat"))
body(weighted.quantile)[[10]] <- substitute(Fx <- (cumsum(w) - min(w))/(sum(w) - min(w)))

Now we get the correct result for weighted quantiles (including the correct median)

weighted.quantile(x, w)
#>   0%  25%  50%  75% 100% 
#> 10.0 10.5 11.0 11.5 12.0 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM