简体   繁体   English

求带掩码元素的numpy ndarray的平均值

[英]Taking mean of numpy ndarray with masked elements

I have a MxN array of values taken from an experiment. 我有一个来自实验的MxN值数组。 Some of these values are invalid and are set to 0 to indicate such. 这些值中的一些无效,并设置为0表示这种情况。 I can construct a mask of valid/invalid values using 我可以使用构造一个有效/无效值的掩码

mask = (mat1 == 0) & (mat2 == 0)

which produces an MxN array of bool. 产生一个MxN的布尔数组。 It should be noted that the masked locations do not neatly follow columns or rows of the matrix - so simply cropping the matrix is not an option. 应该注意的是,被遮罩的位置并不能整齐地跟随矩阵的行或列-因此,简单地裁剪矩阵不是一种选择。

Now, I want to take the mean along one axis of my array (EG end up with a 1xN array) while excluding those invalid values in the mean calculation. 现在,我想沿数组的一个轴取平均值(EG以1xN数组结尾),同时在平均值计算中排除那些无效值。 Intuitively I thought 凭直觉我以为

 np.mean(mat1[mask],axis=1)

should do it, but the mat1[mask] operation produces a 1D array which appears to just be the elements where mask is true - which doesn't help when I only want a mean across one dimension of the array. 应该做到这一点,但是mat1[mask]操作会生成一维数组,该数组似乎只是mask为true的元素-当我只想在数组的一个维度上求平均值时,这无济于事。

Is there a 'python-esque' or numpy way to do this? 有没有一种“ python式”或numpy的方式来做到这一点? I suppose I could use the mask to set masked elements to NaN and use np.nanmean - but that still feels kind of clunky. 我想我可以使用遮罩将遮罩的元素设置为NaN并使用np.nanmean但这仍然有点笨拙。 Is there a way to do this 'cleanly'? 有没有办法做到“干净”?

I think the best way to do this would be something along the lines of: 我认为做到这一点的最佳方法是遵循以下原则:

masked = np.ma.masked_where(mat1 == 0 && mat2 == 0, array_to_mask)

Then take the mean with 然后取平均值

masked.mean(axis=1)

One similarly clunky but efficient way is to multiply your array with the mask, setting the masked values to zero. 一种类似的笨拙但有效的方法是将数组与掩码相乘,将掩码值设置为零。 Then of course you'll have to divide by the number of non-masked values manually. 然后,您当然必须手动除以非掩码值的数量。 Hence clunkiness. 因此笨拙。 But this will work with integer-valued arrays, something that can't be said about the nan case. 但是,这将适用于整数数组,关于nan情况,这还不能说。 It also seems to be fastest for both small and larger arrays (including the masked array solution in another answer): 对于小型和大型阵列,这似乎也是最快的(在另一个答案中包括掩码阵列解决方案):

import numpy as np

def nanny(mat, mask):
    mat = mat.astype(float).copy() # don't mutate the original
    mat[~mask] = np.nan            # mask values
    return np.nanmean(mat, axis=0) # compute mean

def manual(mat, mask):
    # zero masked values, divide by number of nonzeros
    return (mat*mask).sum(axis=0)/mask.sum(axis=0)

# set up dummy data for testing
N,M = 400,400
mat1 = np.random.randint(0,N,(N,M))
mask = np.random.randint(0,2,(N,M)).astype(bool)

print(np.array_equal(nanny(mat1, mask), manual(mat1, mask))) # True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM