简体   繁体   English

numpy 如何在数组/列表中找到中位数?

[英]how does numpy find the median in an array/list?

I read, that numpy uses introselect to find the median in an array/ list ( https://www.researchgate.net/publication/303755458_Fast_Deterministic_Selection ) [page 2;我读到,numpy 使用 introselect 在数组/列表中查找中位数( https://www.researchgate.net/publication/303755458_Fast_Deterministic_Selection )[第 2 页; last 5 lines].最后 5 行]。 But I couldn't find any hints for that in the numpy source code: https://github.com/numpy/numpy/blob/v1.19.0/numpy/lib/function_base.py#L3438-L3525但我在 numpy 源代码中找不到任何提示: https://github.com/numpy/numpy/blob/v1.19.0/numpy/lib/function_base.py#L3438-L3525

Does anyone know where I could find the numpy implementation of introselect?有谁知道我在哪里可以找到 introselect 的 numpy 实现? Or if numpy doesn't use introselect, what kind of algorithm do the use to find the median?或者如果 numpy 不使用 introselect,那么使用什么样的算法来找到中位数?

Many thanks in advance:)提前谢谢了:)

In line 3528 seems to be the main median function.在第 3528 行似乎是主要的中位数 function。 If we cut out all the multidimensional and nan stuff we get something like如果我们去掉所有多维和 nan 的东西,我们会得到类似的东西

def _median(a, axis=None, out=None, overwrite_input=False):
    # can't be reasonably be implemented in terms of percentile as we have to
    # call mean to not break astropy

    # Set the partition indexes
    sz = a.shape
    if sz % 2 == 0:
        szh = sz // 2
        kth = [szh - 1, szh]
    else:
        kth = [(sz - 1) // 2]

    part = partition(a, kth, axis=None)

    return mean(part[indexer], axis=None, out=out)

So partition is doing all the work and comes from所以分区正在做所有的工作并且来自

from numpy.core.fromnumeric import (
    ravel, nonzero, partition, mean, any, sum
    )

If we go to the numpy code we get to the following C code .如果我们从 go 到 numpy 代码,我们将得到以下C 代码

NPY_SELECTKIND sortkind = NPY_INTROSELECT;

and

val = PyArray_Partition(self, ktharray, axis, sortkind);

Which implemented here and uses 在这里实现并使用

mid = ll + median_of_median5_@suff@(v + ll, hh - ll, NULL, NULL);

So it is introselect.所以它是introselect。

Once twice the recursion depth is reached the algorithm change to use meadian-of-median5 until the partition is less than 5.一旦达到递归深度的两倍,算法就会更改为使用中位数的中位数5,直到分区小于 5。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM