简体   繁体   中英

Find number of zeros before non-zero in a numpy array

I have a numpy array A . I would like to return the number of zeros before a non-zero in A in an efficient way as it is in a loop.

If A = np.array([0,1,2]) then np.nonzero(A)[0][0] returns 1. However if A = np.array([0,0,0]) this doesn't work (I would like the answer 3 in this case). And also if A is very big and the first non-zero is near the beginning this seems inefficient.

Here's an iterative Cython version, which may be your best bet if this is a serious bottleneck

# saved as file count_leading_zeros.pyx
import numpy as np
cimport numpy as np
cimport cython

DTYPE = np.int
ctypedef np.int_t DTYPE_t

@cython.boundscheck(False)
def count_leading_zeros(np.ndarray[DTYPE_t, ndim=1] a):
    cdef int elements = a.size
    cdef int i = 0
    cdef int count = 0
    while i < elements:
        if a[i] == 0:
            count += 1
        else:
            return count
        i += 1
    return count

This is similar to @mtrw's answer but with indexing at native speeds. My Cython is a bit sketchy so there may be further improvements to be made.

A quick test of an extremely favourable case with IPython with a few different methods

In [1]: import numpy as np

In [2]: import pyximport; pyximport.install()
Out[2]: (None, <pyximport.pyximport.PyxImporter at 0x53e9250>)

In [3]: import count_leading_zeros

In [4]: %paste
def count_leading_zeros_python(x):
    ctr = 0
    for k in x:
        if k == 0:
            ctr += 1
        else:
            return ctr
    return ctr
## -- End pasted text --
In [5]: a = np.zeros((10000000,), dtype=np.int)

In [6]: a[5] = 1

In [7]: 

In [7]: %timeit np.min(np.nonzero(np.hstack((a, 1))))
10 loops, best of 3: 91.1 ms per loop

In [8]: 

In [8]: %timeit np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
10 loops, best of 3: 107 ms per loop

In [9]: 

In [9]: %timeit count_leading_zeros_python(a)
100000 loops, best of 3: 3.87 µs per loop

In [10]: 

In [10]: %timeit count_leading_zeros.count_leading_zeros(a)
1000000 loops, best of 3: 489 ns per loop

However I'd only use something like this if I had evidence (with a profiler) that this was a bottleneck. Many things may seem inefficient but are never worth your time to fix.

By adding a nonzero number at the end of the array, you can still use np.nonzero to get your desired outcome.

A = np.array([0,1,2])
B = np.array([0,0,0])

np.min(np.nonzero(np.hstack((A, 1))))   # --> 1
np.min(np.nonzero(np.hstack((B, 1))))   # --> 3
i = np.argmax(A!=0)
if i==0 and np.all(A==0): i=len(A)

This should be the most performant solution without extensions. Also easily vectorized to act along multiple axes.

What's wrong with the naive approach:

def countLeadingZeros(x):
""" Count number of elements up to the first non-zero element, return that count """
    ctr = 0
    for k in x:
        if k == 0:
            ctr += 1
        else: #short circuit evaluation, we found a non-zero so return immediately
            return ctr
    return ctr #we get here in the case that x was all zeros

This returns as soon as a non-zero element is found, so it is O(n) in the worst case. You could make it faster by porting it to C, but it would be worth testing to see if that is really necessary for the arrays you're working with.

I am surprised why nobody has not used np.where yet

np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0 else np.shape(a)[0] will do the trick

>> a = np.array([0,1,2])
>> np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
... 1
>> a = np.array([0,0,0))
>> np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
... 3
>> a = np.array([1,2,3))
>> np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
... 0

If you don't care about the speed, I have a small trick to do the job:

a = np.array([0,0,1,1,1])
t = np.where(a==0,1,0)+np.append(np.where(a==0,0,1),0)[1:]
print t
[1 2 1 1 0]
np.where(t==2)
(array([1]),)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM