如何計算未排序數據集的中位數

Question

我有成千上萬個這樣的數據集：

>student1
    quantities score
[1]          4    10         
[2]          1    12         
[3]         78     5         
[4]          6   294

我想計算這個學生的分數中位數。 對於每個分數，我們都有一些數量。 在這種情況下，我希望它返回5，因為中位數是78 5之一。

我在這里看過一些帖子，例如如何計算分組數據集的中位數？ ，但我不能使用它，因為我有數千個數據集。

我還嘗試安裝了aroma.light軟件包和matrixstats軟件包，但仍然不能使用“ weighted.median函數”。 告訴我

Error: could not find function "weightedMedians"

好的，以上只是一個例子，我的真實數據集是這樣的：

>test
     [,1]          [,2]
info    3            10
info    2            20
        4      86779637
        1        135777
        7          2342

但是當我嘗試使用

>rep(test[, 1], test[, 2])

它出現

Error in rep(test[, 1], test[, 2]) : invalid 'times' argument
In addition: Warning message:
NAs introduced by coercion

我現在能做什么？

Answer 1

您可以使用：

median(rep(student1$score, student1$quantities))

這相對較快（對於100k行的模擬數據集，只需幾秒鍾）

Answer 2

在matrixStats數據包中用於計算加權中位數的函數稱為weightedMedian() （不包含多個“ s”），例如

> library("matrixStats")
matrixStats v0.14.0 (2015-02-13) successfully loaded. See ?matrixStats for help.
> weightedMedian(student1$score, w=student1$quantities)
[1] 5.670732
> weightedMedian(student1$score, w=student1$quantities, interpolate=FALSE)
[1] 5

如何計算未排序數據集的中位數

問題描述

2 個解決方案

解決方案1
2 已采納 2014-07-09 09:31:14

解決方案2
0 2014-09-08 20:31:56

如何計算未排序數據集的中位數

問題描述

2 個解決方案

解決方案1 2 已采納 2014-07-09 09:31:14

解決方案2 0 2014-09-08 20:31:56

解決方案1
2 已采納 2014-07-09 09:31:14

解決方案2
0 2014-09-08 20:31:56