How can I use numpy.mean() on ndarray with a condition?

Question

Is there a way to filter values of an ndarray and at the same time take the mean with regards to a certain axis? Here is MWE:

import numpy as np
import random

arr = np.ndarray((10, 5))

for i in range(10):
    for j in range(5):
        arr[i, j] = random.randint(0, 5)

mean = arr[arr < 0.7].mean(axis = 0)

This is not working as arr[arr < 0.7] is the flattes the array.

Any other idea?

Answer 1

One approach would be to use the mask of valid ones set by the comparison against the given threshold, get the sum of elements along axis=0 and divide those by the number of valid ones participating in the summations to get the desired output of average values from the valid ones.

Thus, the implementation would be something like this -

mask = arr < thresh
out = np.einsum('ij,ij->j',arr,mask)/mask.sum(axis = 0)

Sample step-by-step run -

In [49]: arr
Out[49]: 
array([[ 4.,  3.,  2.,  5.,  0.],
       [ 1.,  1.,  5.,  1.,  4.],
       [ 2.,  5.,  1.,  2.,  4.],
       [ 0.,  4.,  0.,  0.,  1.],
       [ 2.,  3.,  0.,  1.,  2.],
       [ 4.,  5.,  3.,  3.,  0.],
       [ 5.,  0.,  0.,  4.,  1.],
       [ 4.,  2.,  0.,  5.,  3.],
       [ 5.,  0.,  0.,  5.,  0.],
       [ 0.,  1.,  0.,  2.,  1.]])

In [50]: thresh = 4

In [51]: mask = arr < thresh

In [52]: mask
Out[52]: 
array([[False,  True,  True, False,  True],
       [ True,  True, False,  True, False],
       [ True, False,  True,  True, False],
       [ True, False,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [False, False,  True,  True,  True],
       [False,  True,  True, False,  True],
       [False,  True,  True, False,  True],
       [False,  True,  True, False,  True],
       [ True,  True,  True,  True,  True]], dtype=bool)

In [53]: np.einsum('ij,ij->j',arr,mask)
Out[53]: array([  5.,  10.,   6.,   9.,   8.])

In [54]: np.einsum('ij,ij->j',arr,mask)/mask.sum(axis = 0)
Out[54]: array([ 1.        ,  1.42857143,  0.66666667,  1.5       ,  1.        ])

Talking of "readability", alternatively, we can use simple elementwise multiplication and summing, like so -

out = (arr*mask).sum(axis = 0)/mask.sum(axis = 0)

Answer 2

You could use masked arrays here:

ok_mask = arr < 0.7
np.ma.masked_where(~ok_mask, arr).mean(axis=0)

If an entire slice along the 0 axis is excluded, then this will return np.masked in that entry

How can I use numpy.mean() on ndarray with a condition?

Question

2 answers

solution1
1 ACCPTED 2017-02-14 11:45:43

solution2
1 2017-02-14 12:01:41

How can I use numpy.mean() on ndarray with a condition?

Question

2 answers

solution1 1 ACCPTED 2017-02-14 11:45:43

solution2 1 2017-02-14 12:01:41

solution1
1 ACCPTED 2017-02-14 11:45:43

solution2
1 2017-02-14 12:01:41