Keep sub-sequences of a binary list if they surpass a given length

Question

I want to create a function that takes as input a list (or numpy array) A and a number L. A is full of 0 and 1 and the goal is to keep the sub-sequences of 1 if they surpass L in length. I wrote a function to do it fix(A,L) but it takes to long to run so I wanted to know if their is a faster way of doing this.

def fix(A,L):
    i=0
    while True:
        if i==len(A):
            return(A)
        if A[i]==1:
            s=0
            for j in range(i,len(A)):
                if A[j]==1:
                    s+=1
                    continue
                else:
                    if s>=L:
                        break
                    else:
                        A[i:j]=[0]*len(A[i:j])
                        break
            if A[j]==1 and s<L:
                A[i:j+1]=[0]*len(A[i:j+1])
            i=j+1
        else:
            i+=1
            continue

if I call fix([1,0,0,1,1,1,0,1,1,1,1,0,1,1,0,1], 3) it returns [0,0,0,1,1,1,0,1,1,1,1,0,0,0,0,0] which is the correct answer.

Answer 1

You can use itertools.groupby and itertools.chain :

def fix(A, L):
    from itertools import groupby, chain
    return list(chain.from_iterable(l if ((len(l:=list(g)) >= L and k) or not k) else [0]*len(l)
                                     for k, g in groupby(A)))
    
fix([1,0,0,1,1,1,0,1,1,1,1,0,1,1,0,1], 3)
# [0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0]

How it works

groupby(A) will group per consecutive 0s or 1s. For each group we get the length and check if this is a group of 1s or 0s. If group of 0s or group of 1s of length ≥ L, we keep it, else we replace with a group of 0s of the same length. Finally, we chain everything to form a continuous list.

Answer 2

If you're working with 2D numpy arrays, what you want to achieve can be done using binary erosion and dilation. We can use scipy.ndimage.binary_erosion and binary_dilation

We're doing it here only on a single dimension:

np.random.seed(0)
A = np.random.randint(0, 2, (10, 20))

from scipy.ndimage import binary_dilation, binary_erosion

L = 3
mask = np.ones((1, L))
binary_dilation(binary_erosion(A, mask), mask).astype(int)

example input:

array([[0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1],
       [0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0],
       [0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1],
       [1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0],
       [0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0],
       [1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0],
       [1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0],
       [1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1],
       [0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0],
       [1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0]])

output:

array([[0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

Visual input/output:

➔

Keep sub-sequences of a binary list if they surpass a given length

Question

2 answers

solution1
2 2022-01-12 08:58:12

How it works

solution2
2 ACCPTED 2022-01-12 10:29:44

Keep sub-sequences of a binary list if they surpass a given length

Question

2 answers

solution1 2 2022-01-12 08:58:12

How it works

solution2 2 ACCPTED 2022-01-12 10:29:44

solution1
2 2022-01-12 08:58:12

solution2
2 ACCPTED 2022-01-12 10:29:44