简体   繁体   中英

efficiently finding the interval with non-zeros in scipy/numpy in Python?

suppose I have a python list or a python 1-d array (represented in numpy). assume that there is a contiguous stretch of elements how can I find the start and end coordinates (ie indices) of the stretch of non-zeros in this list or array? for example,

a = [0, 0, 0, 0, 1, 2, 3, 4]

nonzero_coords(a) should return [4, 7]. for:

b = [1, 2, 3, 4, 0, 0]

nonzero_coords(b) should return [0, 2].

thanks.

Assuming there's a single continuous stretch of nonzero elements...

x = nonzero(a)[0]
result = [x[0], x[-1]]

This worked for multiple holes for me

from numpy import *
def nonzero_intervals(value):
    lvalue = array(value)
    lvalue[0] = 0
    lvalue[-1] = 0
    a = diff((lvalue==0) * 1)
    intervals = zip( find(a == -1),find(a == 1))
    return intervals

Actually, nonzero_coords(b) should return [0, 3]. Can multiple holes occur at the input? If yes, what to do then? The naive solution: scan until first non-zero el. Then scan until the last non-zero el. Code is below (sorry did not test it):

a = [0, 0, 0, 0, 1, 2, 3, 4, 5, 0, 0, 0]
start = 0
size = len(a) # 
while (start < size and a[start] != 0): start += 1
end = start
while (end < size and a[end] != 0): end += 1
return (start, end)

It would be more consistent with python indexing for nonzero_coords([0, 0, 0, 0, 1, 2, 3, 4]) to return (4, 8) than (4, 7) , because [0, 0, 0, 0, 1, 2, 3, 4][4:8] returns [1, 2, 3, 4] .

Here is a function that computes non-zero intervals. It handles multiple intervals:

def nonzero_intervals(vec):
    '''
    Find islands of non-zeros in the vector vec
    '''
    if len(vec)==0:
        return []
    elif not isinstance(vec, np.ndarray):
        vec = np.array(vec)

    edges, = np.nonzero(np.diff((vec==0)*1))
    edge_vec = [edges+1]
    if vec[0] != 0:
        edge_vec.insert(0, [0])
    if vec[-1] != 0:
        edge_vec.append([len(vec)])
    edges = np.concatenate(edge_vec)
    return zip(edges[::2], edges[1::2])

If you really want the answer to have the end indices included in the island, you can just change the last line to: return zip(edges[::2], edges[1::2]-1)

Tests:

a = [0, 0, 0, 0, 1, 2, 3, 4]
intervals = nonzero_intervals(a)
assert intervals == [(4, 8)]

a = [1, 2, 3, 4, 0, 0]
intervals = nonzero_intervals(a)
assert intervals == [(0, 4)]

a=[1, 2, 0, 0, 0, 3, 4, 0]
intervals = nonzero_intervals(a)
assert intervals == [(0, 2), (5, 7)]

a = [0, 4, 0, 6, 0, 6, 7, 0, 9]
intervals = nonzero_intervals(a)
assert intervals == [(1, 2), (3, 4), (5, 7), (8, 9)]

a = [1, 2, 3, 4]
intervals = nonzero_intervals(a)
assert intervals == [(0, 4)]

a = [0, 0, 0]
intervals = nonzero_intervals(a)
assert intervals == []

a = []
intervals = nonzero_intervals(a)
assert intervals == []

If you've got numpy loaded anyway, go with tom10's answer.

If for some reason you want something that works without loading numpy (can't imagine why, to be honest) then I'd suggest something like this:

from itertools import groupby

def nonzero_coords(iterable):
  start = 0
  for iszero, sublist in groupby(iterable, lambda x:x==0):
    if iszero:
      start += len(list(sublist))
    else:
      return start, start+len(list(sublist))-1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM