Removing completely isolated cells from Python array?

Question

I'm trying to reduce noise in a binary python array by removing all completely isolated single cells, ie setting "1" value cells to 0 if they are completely surrounded by other "0"s. I have been able to get a working solution by removing blobs with sizes equal to 1 using a loop, but this seems like a very inefficient solution for large arrays:

import numpy as np
import scipy.ndimage as ndimage
import matplotlib.pyplot as plt    

# Generate sample data
square = np.zeros((32, 32))
square[10:-10, 10:-10] = 1
np.random.seed(12)
x, y = (32*np.random.random((2, 20))).astype(np.int)
square[x, y] = 1

# Plot original data with many isolated single cells
plt.imshow(square, cmap=plt.cm.gray, interpolation='nearest')

# Assign unique labels
id_regions, number_of_ids = ndimage.label(square, structure=np.ones((3,3)))

# Set blobs of size 1 to 0
for i in xrange(number_of_ids + 1):
    if id_regions[id_regions==i].size == 1:
        square[id_regions==i] = 0

# Plot desired output, with all isolated single cells removed
plt.imshow(square, cmap=plt.cm.gray, interpolation='nearest')

In this case, eroding and dilating my array won't work as it will also remove features with a width of 1. I feel the solution lies somewhere within the scipy.ndimage package, but so far I haven't been able to crack it. Any help would be greatly appreciated!

Answer 1

A belated thanks to both Jaime and Kazemakase for their replies. The manual neighbour-checking method did remove all isolated patches, but also removed patches attached to others by one corner (ie to the upper-right of the square in the sample array). The summed area table works perfectly and is very fast on the small sample array, but slows down on larger arrays.

I ended up following a approach using ndimage which seems to work efficiently for very large and sparse arrays (0.91 sec for 5000 x 5000 array vs 1.17 sec for summed area table approach). I first generated a labelled array of unique IDs for each discrete region, calculated sizes for each ID, masked the size array to focus only on size == 1 blobs, then index the original array and set IDs with a size == 1 to 0:

def filter_isolated_cells(array, struct):
    """ Return array with completely isolated single cells removed
    :param array: Array with completely isolated single cells
    :param struct: Structure array for generating unique regions
    :return: Array with minimum region size > 1
    """

    filtered_array = np.copy(array)
    id_regions, num_ids = ndimage.label(filtered_array, structure=struct)
    id_sizes = np.array(ndimage.sum(array, id_regions, range(num_ids + 1)))
    area_mask = (id_sizes == 1)
    filtered_array[area_mask[id_regions]] = 0
    return filtered_array

# Run function on sample array
filtered_array = filter_isolated_cells(square, struct=np.ones((3,3)))

# Plot output, with all isolated single cells removed
plt.imshow(filtered_array, cmap=plt.cm.gray, interpolation='nearest')

Result: 结果数组

Answer 2

The typical way of getting rid of isolated pixels in image processing is to do a morphological opening , for which you have a ready-made implementation in scipy.ndimage.morphology.binary_opening . This would affect the contours of your larger areas as well though.

As for a DIY solution, I would use a summed area table to count the number of items in every 3x3 subimage, subtract from that the value of the central pixel, then zero all center points where the result came out to zero. To properly handle the borders, first pad the array with zeros:

sat = np.pad(square, pad_width=1, mode='constant', constant_values=0)
sat = np.cumsum(np.cumsum(sat, axis=0), axis=1)
sat = np.pad(sat, ((1, 0), (1, 0)), mode='constant', constant_values=0)
# These are all the possible overlapping 3x3 windows sums
sum3x3 = sat[3:, 3:] + sat[:-3, :-3] - sat[3:, :-3] - sat[:-3, 3:]
# This takes away the central pixel value
sum3x3 -= square
# This zeros all the isolated pixels
square[sum3x3 == 0] = 0

The implementation above works, but is not especially careful about not creating intermediate arrays, so you can probably shave off some execution time by refactoring adequately.

Answer 3

You can manually check the neighbors and avoid the loop using vectorization.

has_neighbor = np.zeros(square.shape, bool)
has_neighbor[:, 1:] = np.logical_or(has_neighbor[:, 1:], square[:, :-1] > 0)  # left
has_neighbor[:, :-1] = np.logical_or(has_neighbor[:, :-1], square[:, 1:] > 0)  # right
has_neighbor[1:, :] = np.logical_or(has_neighbor[1:, :], square[:-1, :] > 0)  # above
has_neighbor[:-1, :] = np.logical_or(has_neighbor[:-1, :], square[1:, :] > 0)  # below

square[np.logical_not(has_neighbor)] = 0

That way looping over the square is performed internally by numpy, which is rather more efficient than looping in python. There are two drawbacks of this solution:

If your array is very sparse there may be more efficient ways to check the neighborhood of non-zero points.
If your array is very large the has_neighbor array might consume too much memory. In this case you could loop over sub-arrays of smaller size (trade-off between python loops and vectorization).

I have no experience with ndimage, so there may be a better solution built in somewhere.

Removing completely isolated cells from Python array?

Question

3 answers

solution1
4 ACCPTED 2015-03-29 03:08:29

solution2
2 2015-02-02 13:59:10

solution3
1 2015-02-02 09:25:14

Removing completely isolated cells from Python array?

Question

3 answers

solution1 4 ACCPTED 2015-03-29 03:08:29

solution2 2 2015-02-02 13:59:10

solution3 1 2015-02-02 09:25:14

solution1
4 ACCPTED 2015-03-29 03:08:29

solution2
2 2015-02-02 13:59:10

solution3
1 2015-02-02 09:25:14