I have a numpy array A
. I would like to return the number of zeros before a non-zero in A
in an efficient way as it is in a loop.
If A = np.array([0,1,2])
then np.nonzero(A)[0][0]
returns 1. However if A = np.array([0,0,0])
this doesn't work (I would like the answer 3 in this case). And also if A is very big and the first non-zero is near the beginning this seems inefficient.
Here's an iterative Cython version, which may be your best bet if this is a serious bottleneck
# saved as file count_leading_zeros.pyx
import numpy as np
cimport numpy as np
cimport cython
DTYPE = np.int
ctypedef np.int_t DTYPE_t
@cython.boundscheck(False)
def count_leading_zeros(np.ndarray[DTYPE_t, ndim=1] a):
cdef int elements = a.size
cdef int i = 0
cdef int count = 0
while i < elements:
if a[i] == 0:
count += 1
else:
return count
i += 1
return count
This is similar to @mtrw's answer but with indexing at native speeds. My Cython is a bit sketchy so there may be further improvements to be made.
A quick test of an extremely favourable case with IPython with a few different methods
In [1]: import numpy as np
In [2]: import pyximport; pyximport.install()
Out[2]: (None, <pyximport.pyximport.PyxImporter at 0x53e9250>)
In [3]: import count_leading_zeros
In [4]: %paste
def count_leading_zeros_python(x):
ctr = 0
for k in x:
if k == 0:
ctr += 1
else:
return ctr
return ctr
## -- End pasted text --
In [5]: a = np.zeros((10000000,), dtype=np.int)
In [6]: a[5] = 1
In [7]:
In [7]: %timeit np.min(np.nonzero(np.hstack((a, 1))))
10 loops, best of 3: 91.1 ms per loop
In [8]:
In [8]: %timeit np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0 else np.shape(a)[0]
10 loops, best of 3: 107 ms per loop
In [9]:
In [9]: %timeit count_leading_zeros_python(a)
100000 loops, best of 3: 3.87 µs per loop
In [10]:
In [10]: %timeit count_leading_zeros.count_leading_zeros(a)
1000000 loops, best of 3: 489 ns per loop
However I'd only use something like this if I had evidence (with a profiler) that this was a bottleneck. Many things may seem inefficient but are never worth your time to fix.
By adding a nonzero number at the end of the array, you can still use np.nonzero to get your desired outcome.
A = np.array([0,1,2])
B = np.array([0,0,0])
np.min(np.nonzero(np.hstack((A, 1)))) # --> 1
np.min(np.nonzero(np.hstack((B, 1)))) # --> 3
i = np.argmax(A!=0)
if i==0 and np.all(A==0): i=len(A)
This should be the most performant solution without extensions. Also easily vectorized to act along multiple axes.
What's wrong with the naive approach:
def countLeadingZeros(x):
""" Count number of elements up to the first non-zero element, return that count """
ctr = 0
for k in x:
if k == 0:
ctr += 1
else: #short circuit evaluation, we found a non-zero so return immediately
return ctr
return ctr #we get here in the case that x was all zeros
This returns as soon as a non-zero element is found, so it is O(n) in the worst case. You could make it faster by porting it to C, but it would be worth testing to see if that is really necessary for the arrays you're working with.
I am surprised why nobody has not used np.where
yet
np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0 else np.shape(a)[0]
will do the trick
>> a = np.array([0,1,2])
>> np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0 else np.shape(a)[0]
... 1
>> a = np.array([0,0,0))
>> np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0 else np.shape(a)[0]
... 3
>> a = np.array([1,2,3))
>> np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0 else np.shape(a)[0]
... 0
If you don't care about the speed, I have a small trick to do the job:
a = np.array([0,0,1,1,1])
t = np.where(a==0,1,0)+np.append(np.where(a==0,0,1),0)[1:]
print t
[1 2 1 1 0]
np.where(t==2)
(array([1]),)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.