简体   繁体   English

Numpy:检查多维数组中的元素是否在元组中

[英]Numpy: checking if an element in a multidimensional array is in a tuple

It seems I still struggle with the "in" operator in numpy . 似乎我仍然在努力与“in”运算符挣扎。 Here's the situation: 情况如下:

>>> a = np.random.randint(1, 10, (2, 2, 3))
>>> a
array([[[9, 8, 8],
        [4, 9, 1]],

       [[6, 6, 3],
        [9, 3, 5]]])

I would like to get the indexes of those triplets whose second element is in (6, 8) . 我想获得第二个元素在(6, 8)三元组的索引。 The way I intuitively tried is: 我直观地尝试的方式是:

>>> a[:, :, 1] in (6, 8)
ValueError: The truth value of an array with more than one element...

My ultimate goal would be to insert at those positions the the number preceding those multiplied by two. 我的最终目标是在那些位置插入前面的数字乘以2。 Using the example above, a should become: 使用上面的例子, a应该成为:

array([[[9, 18, 8],   #8 @ pos #2 --> replaced by 9 @ pos #1 by 2
        [4, 9, 1]],

       [[6, 12, 3],   #6 @ pos #2 --> replaced by 6 @ pos #1 by 2
        [9, 3, 5]]])

Thank you in advance for your advice and time! 提前感谢您的建议和时间!

Here's a method that will work for an arbitrary length tuple. 这是一个适用于任意长度元组的方法。 It uses the numpy.in1d function. 它使用numpy.in1d函数。

import numpy as np
np.random.seed(1)

a = np.random.randint(1, 10, (2, 2, 3))
print(a)

check_tuple = (6, 9, 1)

bool_array = np.in1d(a[:,:,1], check_tuple)
ind = np.where(bool_array)[0]
a0 = a[:,:,0].reshape((len(bool_array), ))
a1 = a[:,:,1].reshape((len(bool_array), ))
a1[ind] = a0[ind] * 2

print(a)

And the output: 并输出:

[[[6 9 6]
  [1 1 2]]

 [[8 7 3]
  [5 6 3]]]

[[[ 6 12  6]
  [ 1  2  2]]

 [[ 8  7  3]
  [ 5 10  3]]]

There is another method based on using a lookup table which I learned from one of the developers of Cellprofiler. 还有另一种基于使用查找表的方法,我从Cellprofiler的一个开发人员那里学到了这个方法。 First you need to create a lookup-table (LUT) which has the size of the largest number in your array. 首先,您需要创建一个查找表(LUT),其大小与数组中的最大数字相同。 For each possible array value, the LUT has either a True or a false value. 对于每个可能的数组值,LUT具有True或false值。 Example: 例:

# create a large volume image with random numbers
a = np.random.randint(1, 1000, (50, 1000 , 1000))
labels_to_find=np.unique(np.random.randint(1,1000,500))

# create filter mask LUT 
def find_mask_LUT(inputarr, obs):
    keep = np.zeros(np.max(inputarr)+1, bool)
    keep[np.array(obs)] = True
    return keep[inputarr]

# This will return a mask that is the 
# same shape as a, with True is a is one of the 
# labels we look for, False otherwise
find_mask_LUT(a, labels_to_find)

This works really fast (much faster than np.in1d, and the speed does not depend on the number of objects.) 这非常快(比np.in1d快得多,而且速度不依赖于对象的数量。)

import numpy as np
a = np.array([[[9, 8, 8],
               [4, 9, 1]],

              [[6, 6, 3],
               [9, 3, 5]]])

ind=(a[:,:,1]<=8) & (a[:,:,1]>=6)
a[ind,1]=a[ind,0]*2
print(a)

yields 产量

[[[ 9 18  8]
  [ 4  9  1]]

 [[ 6 12  3]
  [ 9  3  5]]]

If you wish to check for membership in a set which is not a simple range, then I like both mac's idea of using a Python loop and bellamyj's idea of using np.in1d. 如果你想检查一个不是简单范围的集合中的成员资格,那么我既喜欢mac的使用Python循环的想法 ,也喜欢bellamyj使用np.in1d 的想法 Which is faster depends on the size of check_tuple : 哪个更快取决于check_tuple的大小:

test.py: test.py:

import numpy as np
np.random.seed(1)

N = 10
a = np.random.randint(1, 1000, (2, 2, 3))
check_tuple = np.random.randint(1, 1000, N)

def using_in1d(a):
    idx = np.in1d(a[:,:,1], check_tuple)
    idx=idx.reshape(a[:,:,1].shape)
    a[idx,1] = a[idx,0] * 2
    return a

def using_in(a):
    idx = np.zeros(a[:,:,0].shape,dtype=bool)
    for n in check_tuple:
        idx |= a[:,:,1]==n
    a[idx,1] = a[idx,0]*2
    return a

assert np.allclose(using_in1d(a),using_in(a))    

When N = 10, using_in is slightly faster: 当N = 10时, using_in稍快一些:

% python -m timeit -s'import test' 'test.using_in1d(test.a)'
10000 loops, best of 3: 156 usec per loop
% python -m timeit -s'import test' 'test.using_in(test.a)'
10000 loops, best of 3: 143 usec per loop

When N = 100, using_in1d is much faster: 当N = 100时, using_in1d要快得多:

% python -m timeit -s'import test' 'test.using_in1d(test.a)'
10000 loops, best of 3: 171 usec per loop
% python -m timeit -s'import test' 'test.using_in(test.a)'
1000 loops, best of 3: 1.15 msec per loop

Inspired by unutbu 's answer I found out this possible solution: unutbu的回答的启发下,我发现了这个可能的解决方案:

>>> l = (8, 6)
>>> idx = np.zeros((2, 2), dtype=bool)
>>> for n in l:
...     idx |= a[:,:,1] == n
>>> idx
array([[ True, False],
       [ True, False]], dtype=bool)
>>> a[idx]
array([[9, 8, 8],
       [6, 6, 3]])

It requires to know the dimensions of the array to investigate beforehand, though. 但是,它需要事先了解要调查的阵列的尺寸。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM