简体   繁体   中英

Finding contiguous regions in a 1D boolean array

If I have a 1D array of bools, say np.array([True, False, True, True True, False, True, True]) , I'd like a function that could return to me the indices of all contiguous regions of True .

For example, the output of this function called on the above array would produce something like [(0,0), (2,4), (6,7)] .

I'm not sure how to accomplish this nicely, also I would like to be able to do this with PyTorch tensors as well.

Here's one solution I found:

def find_contigs(mask):
    """
    Find contiguous regions in a mask that are True with no False in between
    """
    assert len(mask.shape) == 1 # 1D tensor of bools 
    
    contigs = []
    found_contig = False 
    for i,b in enumerate(mask):
        
        
        if b and not found_contig:   # found the beginning of a contig
            contig = [i]
            found_contig = True 
        
        elif b and found_contig:     # currently have contig, continuing it 
            pass 
        
        elif not b and found_contig: # found the end, record previous index as end, reset indicator  
            contig.append(i-1)
            found_contig = False 
            contigs.append(tuple(contig))
        
        else:                        # currently don't have a contig, and didn't find one 
            pass 
    
    
    # fence post bug - check if the very last entry was True and we didn't get to finish 
    if b:
        contig.append(i)
        found_contig = False 
        contigs.append(tuple(contig))
        
    return contigs

I'm not sure if there's a good purely Numpy approach to this, but since it looks like you're willing to loop, you could just use itertools.groupby and keep track of the index and group length:

from itertools import groupby

a = np.array([True, False, True, True, True, False, True, True])

i = 0
res = []

for k, g in groupby(a):
    l = len(list(g))
    if k:
        res.append((i,i+l))
    i += l

print(res)
# [(0, 1), (2, 5), (6, 8)]

If you want the closed intervals, obviously you can just subtract 1 from the second tuple value.

You could also write it as a generator which can be a little friendlier on the memory with long lists:

from itertools import groupby

a = np.array([True, False, True, True, True, False, True, True])

def true_indices(a):
    i = 0
    for k, g in groupby(a):
        l = len(list(g))
        if k:
            yield (i,i+l)
        i += l

list(true_indices(a))
# [(0, 1), (2, 5), (6, 8)]

I've implemented this exact thing in a utility library I wrote called haggis . The function is haggis. math.mask2runs haggis. math.mask2runs . I wrote it in part to deal with this recurring question on Stack Overflow:

runs = haggis.math.mask2runs(mask)

The second column is exclusive indices, since that's more useful in both python and numpy, so you might want to do

runs[:, -1] -= 1

There's nothing special about this function. You can write it as a one-liner using numpy:

runs = numpy.flatnonzero(np.diff(numpy.r_[numpy.int8(0), mask.view(numpy.int8), numpy.int8(0)])).reshape(-1, 2)

I'm not too familiar with Pytorch but the problem itself looks easily solvable. You're going to want to loop over the whole list and store the first index of a contingency region where one is found. I've used -1 since it's out of bounds of any index. From there you just append a tuple to a list with the saved index and current index and reset the start index to -1. Since this relies on a False boolean being found, I've added a final check outside the loop for a non-reset start index which implies a contingency region from there to the end of the list.

I said current index but to keep it inbound I subtracted 1 to make it more in line with what you wanted.

 def getConRegions(booleanList): conRegions=[] start=-1 for index,item in enumerate(booleanList): if item and start<0:start=index elif start>=0 and not item: conRegions.append((start,index-1)) start=-1 if(start>=0):conRegions.append((start,len(booleanList)-1)) return conRegions print(getConRegions([True,False,True,True,True,False,True,True]))

My humble solution:

list = [True, False, True, True, True, False, True, True]
if list[-1] == True:
    list.extend([False])

answers = []
beginning = "yes"

for index, value in enumerate(list):
    if value == True and beginning == "yes":
        x = index
        beginning = "not anymore"
    elif value == True:
        pass
    else:
        answers.append((x, index -1))
        beginning = "yes"
        
print(answers)

Output:

[(0, 0), (2, 4), (6, 7)]

Fast, numpy-based solution:

import numpy as np

def find_contigs(arr):
    indices = []
    # Each number in this array will be the index of an
    # element in 'arr' just *before* it switches
    nonzeros = np.nonzero(np.diff(arr))[0]

    # Case if array begins with True
    if arr[0]:
        if len(nonzeros):
            indices.append((0, nonzeros[0] + 1))
        else:
            indices.append((0, len(arr)))
        nonzeros = nonzeros[1:]

    # Parse nonzero indices
    for idx in range(0, len(nonzeros), 2):
        if len(nonzeros) - idx == 1:
            indices.append((nonzeros[idx] + 1, len(arr)))
        else:
            indices.append((nonzeros[idx] + 1, nonzeros[idx+1] + 1))

    return indices

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM