How can I improve the efficiency of this numpy loop

Question

I've got a numpy array containing labels. I'd like to get calculate a number for each label based on its size and bounding box. How can I write this more efficiently so that it's realistic to use on large arrays (~15000 labels)?

A = array([[ 1, 1, 0, 3, 3],
           [ 1, 1, 0, 0, 0],
           [ 1, 0, 0, 2, 2],
           [ 1, 0, 2, 2, 2]] )

B = zeros( 4 )

for label in range(1, 4):
    # get the bounding box of the label
    label_points = argwhere( A == label )
    (y0, x0), (y1, x1) = label_points.min(0), label_points.max(0) + 1

    # assume I've computed the size of each label in a numpy array size_A
    B[ label ] = myfunc(y0, x0, y1, x1, size_A[label])

Answer 1

I wasn't really able to implement this efficiently using some NumPy vectorised functions, so maybe a clever Python implementation will be faster.

def first_row(a, labels):
    d = {}
    d_setdefault = d.setdefault
    len_ = len
    num_labels = len_(labels)
    for i, row in enumerate(a):
        for label in row:
            d_setdefault(label, i)
        if len_(d) == num_labels:
            break
    return d

This function returns a dictionary mapping each label to the index of the first row it appears in. Applying the function to A , AT , A[::-1] and AT[::-1] also gives you the first column as well as the last row and column.

If you would rather like a list instead of a dictionary, you can turn the dictionary into a list using map(d.get, labels) . Alternatively, you can use a NumPy array instead of a dictionary right from the start, but you will lose the ability to leave the loop early as soon as all labels were found.

I'd be interested whether (and how much) this actually speeds up your code, but I'm confident that it is faster than your original solution.

Answer 2

Algorithm:

change the array to one dimension
get the sort index by argsort()
get the sorted version of on dimension array as sorted_A
use where() and diff() to find label change position in sorted_A
use the change position and the sort index to get the original position of the label in one dimension.
calculate two dimension location from the on dimension position.

for large array such as (7000, 9000), is can finished the calculation in 30s.

here is the code:

import numpy as np

A = np.array([[ 1, 1, 0, 3, 3],
           [ 1, 1, 0, 0, 0],
           [ 1, 0, 0, 2, 2],
           [ 1, 0, 2, 2, 2]] )

def label_range(A):
    from itertools import izip_longest
    h, w = A.shape
    tmp = A.reshape(-1)

    index = np.argsort(tmp)
    sorted_A = tmp[index]
    pos = np.where(np.diff(sorted_A))[0]+1
    for p1,p2 in izip_longest(pos,pos[1:]):
        label_index = index[p1:p2]
        y = label_index // w
        x = label_index % w

        x0 = np.min(x)
        x1 = np.max(x)+1
        y0 = np.min(y)
        y1 = np.max(y)+1
        label = tmp[label_index[0]]

        yield label,x0,y0,x1,y1

for label,x0,y0,x1,y1 in label_range(A):
    print "%d:(%d,%d)-(%d,%d)" % (label, x0,y0,x1,y1)

#B = np.random.randint(0, 100, (7000, 9000))
#list(label_range(B))

Answer 3

Another method:

use bincount() to get labels count in every row and column, and save the information in rows and cols array.

For each label you only need to search the range in rows and columns. It's faster than sort, on my pc, it can do the calculation in a few seconds.

def label_range2(A):
    maxlabel = np.max(A)+1
    h, w = A.shape
    rows = np.zeros((h, maxlabel), np.bool)
    for row in xrange(h):
        rows[row,:] = np.bincount(A[row,:], minlength=maxlabel) > 0

    cols = np.zeros((w, maxlabel), np.bool)
    for col in xrange(w):
        cols[col,:] =np.bincount(A[:,col], minlength=maxlabel) > 0

    for label in xrange(1, maxlabel):
        row = rows[:, label]
        col = cols[:, label]
        y = np.where(row)[0]
        x = np.where(col)[0]
        x0 = np.min(x)
        x1 = np.max(x)+1
        y0 = np.min(y)
        y1 = np.max(y)+1        
        yield label, x0,y0,x1,y1

Answer 4

The performace bottleneck seems indeed to be the call to argmax . It can be avoided by changing the loop as follows (only computing y0, y1, but easy to generalize to x0, x1):

for label in range(1, 4):
    comp = (A == label)
    yminind = comp.argmax(0)
    ymin = comp.max(0)
    ymaxind = comp.shape[0] - comp[::-1].argmax(0)
    y0 = yminind[ymin].min()
    y1 = ymaxind[ymin].max()

I'm not sure about the reason for the performance difference, but one reason might be that all operations like == , argmax , and max can preallocate their output array directly from the shape of the input array, which is not possible for argwhere .

Answer 5

Using PyPy you can just run the loop and not worry about the vectorization. It should be fast.

How can I improve the efficiency of this numpy loop

Question

5 answers

solution1
7 ACCPTED 2011-11-23 17:04:19

solution2
5 2011-11-24 01:48:16

solution3
5 2011-11-24 02:25:57

solution4
1 2011-11-23 20:00:22

solution5
1 2011-11-28 11:58:58

How can I improve the efficiency of this numpy loop

Question

5 answers

solution1 7 ACCPTED 2011-11-23 17:04:19

solution2 5 2011-11-24 01:48:16

solution3 5 2011-11-24 02:25:57

solution4 1 2011-11-23 20:00:22

solution5 1 2011-11-28 11:58:58

solution1
7 ACCPTED 2011-11-23 17:04:19

solution2
5 2011-11-24 01:48:16

solution3
5 2011-11-24 02:25:57

solution4
1 2011-11-23 20:00:22

solution5
1 2011-11-28 11:58:58