Taking mean of numpy ndarray with masked elements

Question

I have a MxN array of values taken from an experiment. Some of these values are invalid and are set to 0 to indicate such. I can construct a mask of valid/invalid values using

mask = (mat1 == 0) & (mat2 == 0)

which produces an MxN array of bool. It should be noted that the masked locations do not neatly follow columns or rows of the matrix - so simply cropping the matrix is not an option.

Now, I want to take the mean along one axis of my array (EG end up with a 1xN array) while excluding those invalid values in the mean calculation. Intuitively I thought

 np.mean(mat1[mask],axis=1)

should do it, but the mat1[mask] operation produces a 1D array which appears to just be the elements where mask is true - which doesn't help when I only want a mean across one dimension of the array.

Is there a 'python-esque' or numpy way to do this? I suppose I could use the mask to set masked elements to NaN and use np.nanmean - but that still feels kind of clunky. Is there a way to do this 'cleanly'?

Answer 1

I think the best way to do this would be something along the lines of:

masked = np.ma.masked_where(mat1 == 0 && mat2 == 0, array_to_mask)

Then take the mean with

masked.mean(axis=1)

Answer 2

One similarly clunky but efficient way is to multiply your array with the mask, setting the masked values to zero. Then of course you'll have to divide by the number of non-masked values manually. Hence clunkiness. But this will work with integer-valued arrays, something that can't be said about the nan case. It also seems to be fastest for both small and larger arrays (including the masked array solution in another answer):

import numpy as np

def nanny(mat, mask):
    mat = mat.astype(float).copy() # don't mutate the original
    mat[~mask] = np.nan            # mask values
    return np.nanmean(mat, axis=0) # compute mean

def manual(mat, mask):
    # zero masked values, divide by number of nonzeros
    return (mat*mask).sum(axis=0)/mask.sum(axis=0)

# set up dummy data for testing
N,M = 400,400
mat1 = np.random.randint(0,N,(N,M))
mask = np.random.randint(0,2,(N,M)).astype(bool)

print(np.array_equal(nanny(mat1, mask), manual(mat1, mask))) # True

Taking mean of numpy ndarray with masked elements

Question

2 answers

solution1
1 ACCPTED 2018-10-16 20:27:38

solution2
1 2018-10-16 20:35:26

Taking mean of numpy ndarray with masked elements

Question

2 answers

solution1 1 ACCPTED 2018-10-16 20:27:38

solution2 1 2018-10-16 20:35:26

solution1
1 ACCPTED 2018-10-16 20:27:38

solution2
1 2018-10-16 20:35:26