简体   繁体   中英

Numpy: checking if an element in a multidimensional array is in a tuple

It seems I still struggle with the "in" operator in numpy . Here's the situation:

>>> a = np.random.randint(1, 10, (2, 2, 3))
>>> a
array([[[9, 8, 8],
        [4, 9, 1]],

       [[6, 6, 3],
        [9, 3, 5]]])

I would like to get the indexes of those triplets whose second element is in (6, 8) . The way I intuitively tried is:

>>> a[:, :, 1] in (6, 8)
ValueError: The truth value of an array with more than one element...

My ultimate goal would be to insert at those positions the the number preceding those multiplied by two. Using the example above, a should become:

array([[[9, 18, 8],   #8 @ pos #2 --> replaced by 9 @ pos #1 by 2
        [4, 9, 1]],

       [[6, 12, 3],   #6 @ pos #2 --> replaced by 6 @ pos #1 by 2
        [9, 3, 5]]])

Thank you in advance for your advice and time!

Here's a method that will work for an arbitrary length tuple. It uses the numpy.in1d function.

import numpy as np
np.random.seed(1)

a = np.random.randint(1, 10, (2, 2, 3))
print(a)

check_tuple = (6, 9, 1)

bool_array = np.in1d(a[:,:,1], check_tuple)
ind = np.where(bool_array)[0]
a0 = a[:,:,0].reshape((len(bool_array), ))
a1 = a[:,:,1].reshape((len(bool_array), ))
a1[ind] = a0[ind] * 2

print(a)

And the output:

[[[6 9 6]
  [1 1 2]]

 [[8 7 3]
  [5 6 3]]]

[[[ 6 12  6]
  [ 1  2  2]]

 [[ 8  7  3]
  [ 5 10  3]]]

There is another method based on using a lookup table which I learned from one of the developers of Cellprofiler. First you need to create a lookup-table (LUT) which has the size of the largest number in your array. For each possible array value, the LUT has either a True or a false value. Example:

# create a large volume image with random numbers
a = np.random.randint(1, 1000, (50, 1000 , 1000))
labels_to_find=np.unique(np.random.randint(1,1000,500))

# create filter mask LUT 
def find_mask_LUT(inputarr, obs):
    keep = np.zeros(np.max(inputarr)+1, bool)
    keep[np.array(obs)] = True
    return keep[inputarr]

# This will return a mask that is the 
# same shape as a, with True is a is one of the 
# labels we look for, False otherwise
find_mask_LUT(a, labels_to_find)

This works really fast (much faster than np.in1d, and the speed does not depend on the number of objects.)

import numpy as np
a = np.array([[[9, 8, 8],
               [4, 9, 1]],

              [[6, 6, 3],
               [9, 3, 5]]])

ind=(a[:,:,1]<=8) & (a[:,:,1]>=6)
a[ind,1]=a[ind,0]*2
print(a)

yields

[[[ 9 18  8]
  [ 4  9  1]]

 [[ 6 12  3]
  [ 9  3  5]]]

If you wish to check for membership in a set which is not a simple range, then I like both mac's idea of using a Python loop and bellamyj's idea of using np.in1d. Which is faster depends on the size of check_tuple :

test.py:

import numpy as np
np.random.seed(1)

N = 10
a = np.random.randint(1, 1000, (2, 2, 3))
check_tuple = np.random.randint(1, 1000, N)

def using_in1d(a):
    idx = np.in1d(a[:,:,1], check_tuple)
    idx=idx.reshape(a[:,:,1].shape)
    a[idx,1] = a[idx,0] * 2
    return a

def using_in(a):
    idx = np.zeros(a[:,:,0].shape,dtype=bool)
    for n in check_tuple:
        idx |= a[:,:,1]==n
    a[idx,1] = a[idx,0]*2
    return a

assert np.allclose(using_in1d(a),using_in(a))    

When N = 10, using_in is slightly faster:

% python -m timeit -s'import test' 'test.using_in1d(test.a)'
10000 loops, best of 3: 156 usec per loop
% python -m timeit -s'import test' 'test.using_in(test.a)'
10000 loops, best of 3: 143 usec per loop

When N = 100, using_in1d is much faster:

% python -m timeit -s'import test' 'test.using_in1d(test.a)'
10000 loops, best of 3: 171 usec per loop
% python -m timeit -s'import test' 'test.using_in(test.a)'
1000 loops, best of 3: 1.15 msec per loop

Inspired by unutbu 's answer I found out this possible solution:

>>> l = (8, 6)
>>> idx = np.zeros((2, 2), dtype=bool)
>>> for n in l:
...     idx |= a[:,:,1] == n
>>> idx
array([[ True, False],
       [ True, False]], dtype=bool)
>>> a[idx]
array([[9, 8, 8],
       [6, 6, 3]])

It requires to know the dimensions of the array to investigate beforehand, though.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM