简体   繁体   中英

How can I use numpy.mean() on ndarray with a condition?

Is there a way to filter values of an ndarray and at the same time take the mean with regards to a certain axis? Here is MWE:

import numpy as np
import random

arr = np.ndarray((10, 5))

for i in range(10):
    for j in range(5):
        arr[i, j] = random.randint(0, 5)

mean = arr[arr < 0.7].mean(axis = 0)

This is not working as arr[arr < 0.7] is the flattes the array.

Any other idea?

One approach would be to use the mask of valid ones set by the comparison against the given threshold, get the sum of elements along axis=0 and divide those by the number of valid ones participating in the summations to get the desired output of average values from the valid ones.

Thus, the implementation would be something like this -

mask = arr < thresh
out = np.einsum('ij,ij->j',arr,mask)/mask.sum(axis = 0)

Sample step-by-step run -

In [49]: arr
Out[49]: 
array([[ 4.,  3.,  2.,  5.,  0.],
       [ 1.,  1.,  5.,  1.,  4.],
       [ 2.,  5.,  1.,  2.,  4.],
       [ 0.,  4.,  0.,  0.,  1.],
       [ 2.,  3.,  0.,  1.,  2.],
       [ 4.,  5.,  3.,  3.,  0.],
       [ 5.,  0.,  0.,  4.,  1.],
       [ 4.,  2.,  0.,  5.,  3.],
       [ 5.,  0.,  0.,  5.,  0.],
       [ 0.,  1.,  0.,  2.,  1.]])

In [50]: thresh = 4

In [51]: mask = arr < thresh

In [52]: mask
Out[52]: 
array([[False,  True,  True, False,  True],
       [ True,  True, False,  True, False],
       [ True, False,  True,  True, False],
       [ True, False,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [False, False,  True,  True,  True],
       [False,  True,  True, False,  True],
       [False,  True,  True, False,  True],
       [False,  True,  True, False,  True],
       [ True,  True,  True,  True,  True]], dtype=bool)

In [53]: np.einsum('ij,ij->j',arr,mask)
Out[53]: array([  5.,  10.,   6.,   9.,   8.])

In [54]: np.einsum('ij,ij->j',arr,mask)/mask.sum(axis = 0)
Out[54]: array([ 1.        ,  1.42857143,  0.66666667,  1.5       ,  1.        ])

Talking of "readability", alternatively, we can use simple elementwise multiplication and summing, like so -

out = (arr*mask).sum(axis = 0)/mask.sum(axis = 0)

You could use masked arrays here:

ok_mask = arr < 0.7
np.ma.masked_where(~ok_mask, arr).mean(axis=0)

If an entire slice along the 0 axis is excluded, then this will return np.masked in that entry

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM