efficiently finding the interval with non-zeros in scipy/numpy in Python?

Question

suppose I have a python list or a python 1-d array (represented in numpy). assume that there is a contiguous stretch of elements how can I find the start and end coordinates (ie indices) of the stretch of non-zeros in this list or array? for example,

a = [0, 0, 0, 0, 1, 2, 3, 4]

nonzero_coords(a) should return [4, 7]. for:

b = [1, 2, 3, 4, 0, 0]

nonzero_coords(b) should return [0, 2].

thanks.

Answer 1

Assuming there's a single continuous stretch of nonzero elements...

x = nonzero(a)[0]
result = [x[0], x[-1]]

Answer 2

This worked for multiple holes for me

from numpy import *
def nonzero_intervals(value):
    lvalue = array(value)
    lvalue[0] = 0
    lvalue[-1] = 0
    a = diff((lvalue==0) * 1)
    intervals = zip( find(a == -1),find(a == 1))
    return intervals

Answer 3

Actually, nonzero_coords(b) should return [0, 3]. Can multiple holes occur at the input? If yes, what to do then? The naive solution: scan until first non-zero el. Then scan until the last non-zero el. Code is below (sorry did not test it):

a = [0, 0, 0, 0, 1, 2, 3, 4, 5, 0, 0, 0]
start = 0
size = len(a) # 
while (start < size and a[start] != 0): start += 1
end = start
while (end < size and a[end] != 0): end += 1
return (start, end)

Answer 4

It would be more consistent with python indexing for nonzero_coords([0, 0, 0, 0, 1, 2, 3, 4]) to return (4, 8) than (4, 7) , because [0, 0, 0, 0, 1, 2, 3, 4][4:8] returns [1, 2, 3, 4] .

Here is a function that computes non-zero intervals. It handles multiple intervals:

def nonzero_intervals(vec):
    '''
    Find islands of non-zeros in the vector vec
    '''
    if len(vec)==0:
        return []
    elif not isinstance(vec, np.ndarray):
        vec = np.array(vec)

    edges, = np.nonzero(np.diff((vec==0)*1))
    edge_vec = [edges+1]
    if vec[0] != 0:
        edge_vec.insert(0, [0])
    if vec[-1] != 0:
        edge_vec.append([len(vec)])
    edges = np.concatenate(edge_vec)
    return zip(edges[::2], edges[1::2])

If you really want the answer to have the end indices included in the island, you can just change the last line to: return zip(edges[::2], edges[1::2]-1)

Tests:

a = [0, 0, 0, 0, 1, 2, 3, 4]
intervals = nonzero_intervals(a)
assert intervals == [(4, 8)]

a = [1, 2, 3, 4, 0, 0]
intervals = nonzero_intervals(a)
assert intervals == [(0, 4)]

a=[1, 2, 0, 0, 0, 3, 4, 0]
intervals = nonzero_intervals(a)
assert intervals == [(0, 2), (5, 7)]

a = [0, 4, 0, 6, 0, 6, 7, 0, 9]
intervals = nonzero_intervals(a)
assert intervals == [(1, 2), (3, 4), (5, 7), (8, 9)]

a = [1, 2, 3, 4]
intervals = nonzero_intervals(a)
assert intervals == [(0, 4)]

a = [0, 0, 0]
intervals = nonzero_intervals(a)
assert intervals == []

a = []
intervals = nonzero_intervals(a)
assert intervals == []

Answer 5

If you've got numpy loaded anyway, go with tom10's answer.

If for some reason you want something that works without loading numpy (can't imagine why, to be honest) then I'd suggest something like this:

from itertools import groupby

def nonzero_coords(iterable):
  start = 0
  for iszero, sublist in groupby(iterable, lambda x:x==0):
    if iszero:
      start += len(list(sublist))
    else:
      return start, start+len(list(sublist))-1

efficiently finding the interval with non-zeros in scipy/numpy in Python?

Question

5 answers

solution1
4 2010-04-12 01:35:50

solution2
2 2012-10-12 12:14:46

solution3
1 2010-04-12 01:33:03

solution4
1 2014-12-24 22:37:10

solution5
0 2010-04-12 01:46:39

efficiently finding the interval with non-zeros in scipy/numpy in Python?

Question

5 answers

solution1 4 2010-04-12 01:35:50

solution2 2 2012-10-12 12:14:46

solution3 1 2010-04-12 01:33:03

solution4 1 2014-12-24 22:37:10

solution5 0 2010-04-12 01:46:39

solution1
4 2010-04-12 01:35:50

solution2
2 2012-10-12 12:14:46

solution3
1 2010-04-12 01:33:03

solution4
1 2014-12-24 22:37:10

solution5
0 2010-04-12 01:46:39