简体   繁体   中英

numpy mask array limiting the frequency of masked values

Starting from an array:

a = np.array([1,1,1,2,3,4,5,5])

and a filter:

m = np.array([1,5])

I am now building a mask with:

b = np.in1d(a,m)

that correctly returns:

array([ True,  True,  True, False, False, False,  True,  True], dtype=bool)

I would need to limit the number of boolean True s for unique values to a maximum value of 2, so that 1 is masked only two times instead of three). The resulting mask would then appear (no matter the order of the first real True values):

array([ True,  True,  False, False, False, False,  True,  True], dtype=bool)

or

array([ True,  False,  True, False, False, False,  True,  True], dtype=bool)

or

array([ False,  True,  True, False, False, False,  True,  True], dtype=bool)

Ideally this is a kind of "random" masking over a limited frequency of values. So far I tried to random select the original unique elements in the array, but actually the mask select the True values no matter their frequency.

For a generic case with unsorted input array, here's one approach based on np.searchsorted -

N = 2 # Parameter to decide how many duplicates are allowed

sortidx = a.argsort()
idx = np.searchsorted(a,m,sorter=sortidx)[:,None] + np.arange(N)
lim_counts = (a[:,None] == m).sum(0).clip(max=N)
idx_clipped = idx[lim_counts[:,None] > np.arange(N)]
out = np.in1d(np.arange(a.size),idx_clipped)[sortidx.argsort()]

Sample run -

In [37]: a
Out[37]: array([5, 1, 4, 2, 1, 3, 5, 1])

In [38]: m
Out[38]: [1, 2, 5]

In [39]: N
Out[39]: 2

In [40]: out
Out[40]: array([ True, True, False, True, True, False, True, False], dtype=bool)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM