简体   繁体   中英

Get all other indices of ndArray / list

Given a values array and a valid indices array, I would like to get all the other indices.

Looking for a pythonic way to do it, but here is an example for a solution and also clarify what is it I'm trying to accomplish:

A = np.array(['a', 'b', 'c', 'd', 'e', 'f', 'g'])  # Array of values. Shape: (7,)
B = np.array([0,3,5])  # Array of indices.

# Looking for a more elegant way to do this following line
C = np.array([i for i in range(len(A)) if i not in B])  # Array indices not in B

# Expected Output: C = [1, 2, 4, 6]

Edit : Benchmarking the solutions

A = np.ones(10000)  
B = np.random.random_integers(low=0, high=len(A) - 1, size=8000)  

t1 = time()  
mask = np.ones(len(A), dtype=bool)  
mask[B] = False  
C = np.arange(len(A))[mask]  
t1 = time() - t1  

t2 = time()  
C = np.delete(np.arange(A.size), B)  
t2 = time() - t2  

t3 = time()  
C = np.array([i for i in range(len(A)) if i not in B])  
t3 = time() - t3  

t4 = time()  
C = set(np.arange(len(A))).difference(B)  
t4 = time() - t4  

print("T1: %.5f" % np.round(t1, 5))    
print("T2: %.5f" % np.round(t2, 5))  
print("T3: %.5f" % np.round(t3, 5))  
print("T4: %.5f" % np.round(t4, 5))  

Results (Values varied when number of indices in B changes, but the fastest always remained T1 :

T1: 0.00011 <<< Ran the above script multiple times, this always was the fastest. Second approach was always just a little behind.
T2: 0.00017
T3: 0.05746 << List comprehension took the most time. Even after removing the np.array.
T4: 0.00158

  • Conclusion:
    I will be using the second approach above ( T2 ) just because it is a one liner and takes (almost) the same time as the fastest approach.

You can use np.delete to remove items of B from the list of other indices that you can create using np.arange :

inds = np.delete(np.arange(A.size), B)

Demo:

In [53]: A = np.array(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
    ...: B = np.array([0,3,5])

In [54]: inds = np.delete(np.arange(A.size), B)

In [55]: inds
Out[55]: array([1, 2, 4, 6])

I'm not sure this is Pythonic, but it is more Numpythonic (if that's a thing). First of all, lookups on arrays are O(N). Second of all, falling down to Python iteration (in your list comprehension) breaks the purpose of using numpy arrays in the first place.

A = np.array([1,2,3,4,5,6,7]) 
B = np.array([0,3,5])
mask = np.ones(len(A), dtype=bool)
mask[B] = False
not_in_b = np.arange(len(A))[mask]

Edit

Some benchmarks.

In [9]: a = np.ones(1000000)

In [10]: b = np.random.choice(1000000, size=10000, replace=False)

In [11]: def test1(a, b):
    ...:     mask = np.ones(len(a), dtype=bool)
    ...:     mask[b] = False
    ...:     return np.arange(len(a))[mask]
    ...: 
    ...: 

In [12]: %timeit test1(a, b)
4.72 ms ± 15 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [13]: %timeit np.delete(np.arange(a.size), b)
4.72 ms ± 21.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Surprisingly enough, @Kasramvd's solution is not faster than mine, though it is quite a bit cleaner. Given this results, I would not be surprised if np.delete is actually a thin wrapper around the same logic I've implemented. Therefore I see no reasons to prefer my solution over @Kasramvd's.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM