简体   繁体   English

numpy 掩码数组限制掩码值的频率

[英]numpy mask array limiting the frequency of masked values

Starting from an array:从数组开始:

a = np.array([1,1,1,2,3,4,5,5])

and a filter:和一个过滤器:

m = np.array([1,5])

I am now building a mask with:我现在正在制作一个面具:

b = np.in1d(a,m)

that correctly returns:正确返回:

array([ True,  True,  True, False, False, False,  True,  True], dtype=bool)

I would need to limit the number of boolean True s for unique values to a maximum value of 2, so that 1 is masked only two times instead of three).我需要将唯一值的 boolean True s 的数量限制为最大值 2,以便 1 仅被屏蔽两次而不是三次)。 The resulting mask would then appear (no matter the order of the first real True values):然后将得到的面具似乎(无论是第一次真正的秩序True值):

array([ True,  True,  False, False, False, False,  True,  True], dtype=bool)

or或者

array([ True,  False,  True, False, False, False,  True,  True], dtype=bool)

or或者

array([ False,  True,  True, False, False, False,  True,  True], dtype=bool)

Ideally this is a kind of "random" masking over a limited frequency of values.理想情况下,这是对有限频率值的一种“随机”掩蔽。 So far I tried to random select the original unique elements in the array, but actually the mask select the True values no matter their frequency.到目前为止,我尝试随机选择数组中的原始唯一元素,但实际上掩码选择了True值,无论它们的频率如何。

For a generic case with unsorted input array, here's one approach based on np.searchsorted -对于未排序输入数组的通用情况,这是一种基于np.searchsorted -

N = 2 # Parameter to decide how many duplicates are allowed

sortidx = a.argsort()
idx = np.searchsorted(a,m,sorter=sortidx)[:,None] + np.arange(N)
lim_counts = (a[:,None] == m).sum(0).clip(max=N)
idx_clipped = idx[lim_counts[:,None] > np.arange(N)]
out = np.in1d(np.arange(a.size),idx_clipped)[sortidx.argsort()]

Sample run -样品运行 -

In [37]: a
Out[37]: array([5, 1, 4, 2, 1, 3, 5, 1])

In [38]: m
Out[38]: [1, 2, 5]

In [39]: N
Out[39]: 2

In [40]: out
Out[40]: array([ True, True, False, True, True, False, True, False], dtype=bool)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM