简体   繁体   English

如何在 Numpy 中创建带有掩码值的数组的直方图?

[英]How to create the histogram of an array with masked values, in Numpy?

In Numpy 1.4.1, what is the simplest or most efficient way of calculating the histogram of a masked array?在 Numpy 1.4.1 中,计算屏蔽数组直方图的最简单或最有效的方法是什么? numpy.histogram and pyplot.hist do count the masked elements, by default!默认情况下, numpy.histogrampyplot.hist会计算被屏蔽的元素!

The only simple solution I can think of right now involves creating a new array with the non-masked value:我现在能想到的唯一简单解决方案是创建一个具有非屏蔽值的新数组:

histogram(m_arr[~m_arr.mask])

This is not very efficient, though, as this unnecessarily creates a new array.但是,这不是很有效,因为这会不必要地创建一个新数组。 I'd be happy to read about better ideas!我很乐意阅读更好的想法!

(Undeleting this as per discussion above...) (根据上面的讨论取消删除它......)

I'm not sure whether or not the numpy developers would consider this a bug or expected behavior.我不确定 numpy 开发人员是否会认为这是一个错误或预期的行为。 I asked on the mailing list , so I guess we'll see what they say.在邮件列表上问过,所以我想我们会看看他们怎么说。

Either way, it's an easy fix.无论哪种方式,这都很容易解决。 Patching numpy/lib/function_base.py to use numpy.asanyarray rather than numpy.asarray on the inputs to the function will allow it to properly use masked arrays (or any other subclass of an ndarray) without creating a copy.修补numpy/lib/function_base.py以在函数的输入上使用numpy.asanyarray而不是numpy.asarray将允许它正确使用掩码数组(或 ndarray 的任何其他子类)而无需创建副本。

Edit: It seems like it is expected behavior.编辑:这似乎是预期的行为。 As discussed here : 正如这里所讨论的

If you want to ignore masked data it's just on extra function call如果你想忽略屏蔽数据,它只是额外的函数调用

histogram(m_arr.compressed())直方图(m_arr.compressed())

I don't think the fact that this makes an extra copy will be relevant, because I guess full masked array handling inside histogram will be a lot more expensive.我不认为这会产生额外的副本这一事实是相关的,因为我猜直方图中的全掩码数组处理会贵得多。

Using asanyarray would also allow matrices in and other subtypes that might not be handled correctly by the histogram calculations.使用 asanyarray 还允许使用直方图计算可能无法正确处理的矩阵和其他子类型。

For anything else besides dropping masked observations, it would be necessary to figure out what the masked array definition of a histogram is, as Bruce pointed out.正如布鲁斯指出的那样,除了丢弃屏蔽观察之外的任何其他事情,有必要弄清楚直方图的屏蔽数组定义是什么。

尝试hist(m_arr.compressed())

This is a super old question, but these days I just use:这是一个非常古老的问题,但这些天我只使用:

numpy.histogram(m_arr, bins=.., range=.., density=False, weights=m_arr_mask)

Where m_arr_mask is an array with the same shape as m_arr, consisting of 0 values for elements of m_arr to be excluded from the histogram and 1 values for elements that are to be included.其中 m_arr_mask 是一个与 m_arr 形状相同的数组,由要从直方图中排除的 m_arr 元素的 0 个值和要包括的元素的 1 个值组成。

After running into casting issues by trying Erik's solution (see https://github.com/numpy/numpy/issues/16616 ), I decided to write a numba function to achieve this behavior.通过尝试 Erik 的解决方案(请参阅https://github.com/numpy/numpy/issues/16616 )遇到转换问题后,我决定编写一个 numba 函数来实现此行为。

Some of the code was inspired by https://numba.pydata.org/numba-examples/examples/density_estimation/histogram/results.html .一些代码的灵感来自https://numba.pydata.org/numba-examples/examples/density_estimation/histogram/results.html I added the mask bit.我添加了mask位。

import numpy
import numba  

@numba.jit(nopython=True)
def compute_bin(x, bin_edges):
    # assuming uniform bins for now
    n = bin_edges.shape[0] - 1
    a_min = bin_edges[0]
    a_max = bin_edges[-1]

    # special case to mirror NumPy behavior for last bin
    if x == a_max:
        return n - 1  # a_max always in last bin

    bin = int(n * (x - a_min) / (a_max - a_min))

    if bin < 0 or bin >= n:
        return None
    else:
        return bin


@numba.jit(nopython=True)
def masked_histogram(img, bin_edges, mask):
    hist = numpy.zeros(len(bin_edges) - 1, dtype=numpy.intp)

    for i, value in enumerate(img.flat):
        if mask.flat[i]:
            bin = compute_bin(value, bin_edges)
            if bin is not None:
                hist[int(bin)] += 1
    return hist  # , bin_edges

The speedup is significant.加速是显着的。 On a (1000, 1000) image:在 (1000, 1000) 图像上:

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM