Find the index of last non-zero value per column of a 2-D array

Question

Given a 2-D array, x , I would like to find the indices of bottom-most non-zero element at each column. For example, if

x = np.array([[1, 2, 3, 4],
              [1, 5, 0, 1],
              [1, 0, 0, 1],
              [0, 0, 0, 0]])

the result should be [2, 1, 0, 2] .

One obvious way to do so is

 [col.nonzero()[0].max() for col in x.T]

I am, however, a bit concerned that the above may not be optimal in terms of performance. Are there any more efficient ways to do so?

Answer 1

Numpy does not provide a way to do that directly ( np.argmin and np.argmax find the first occurrence and not the last). Thus, a trick is needed.

The thing is finding the last non-zero value per column is similar to finding the first non-zero value per column if columns are flipped.

Here is an implementation using np.argmin to find the first:

(x.shape[0]-1) - np.argmax(x[::-1,:]!=0, axis=0)

You should ensure that there is at least one non-zero value per column for the result to be fine.

Answer 2

In order to establish what method is more efficient, it's important to compare how several implementations perform while the input data scales. (In that case, a 2-D array.)

For the sake of consistency, in case there's no element different than zero in the column, the returned index is -1.

The presented code [col.nonzero()[0].max() for col in xT] wasn't considering that special case, of a column only containing zeros, where col.nonzero() would return an empty array (and cause an exception while trying to use it).

Possible approaches

A.1: use only built-in Python functions, ignoring NumPy

[max(c) if (c:=[i for i, val in enumerate(col) if val]) else -1 for col in zip(*x)]

A.2: use built-in Python functions and NumPy transposed numpy.ndarray.T

[max(c) if (c:=[i for i, val in enumerate(col) if val]) else -1 for col in x.T]

B: use numpy.ndarray.nonzero function and the transposed array, but in a comprehension for each column

[c.max() if (c:=col.nonzero()[0]).size else -1 for col in x.T]

C.1: use no transposition nor comprehension, just the numpy.argmax function over the reversed columns

np.where(np.count_nonzero(x, axis=0)==0, -1, (x.shape[0]-1) - np.argmin(x[::-1,:]==0, axis=0))

C.2: use no transposition nor comprehension, just the numpy.argmin function over the reversed columns

np.where(np.count_nonzero(x, axis=0)==0, -1, (x.shape[0]-1) - np.argmax(x[::-1,:]!=0, axis=0)),

Performance comparison

In the following comparison, those cases are executed over an input data that increase exponentially in size.

The input data is a square 2-D NumPy array, and its elements are randomly generated integers, with 50% probability of been 0, and 50% of been any number from 1 to 9 (equially distriuted).

import numpy as np
import perfplot

perfplot.bench(
    n_range=[2**k for k in range(1, 14)],  # 2, 4, 8, 16, 32, ..., 2k, 4k, 8k
    setup=lambda n: np.random.choice(range(10), size=(n, n), p=[0.5]+[0.5/9]*9),
    kernels=[
        lambda x: [max(c) if (c:=[i for i, val in enumerate(col) if val]) else -1 for col in zip(*x)],
        lambda x: [max(c) if (c:=[i for i, val in enumerate(col) if val]) else -1 for col in x.T],
        lambda x: [c.max() if (c:=col.nonzero()[0]).size else -1 for col in x.T],
        lambda x: np.where(np.count_nonzero(x, axis=0)==0, -1, (x.shape[0]-1) - np.argmin(x[::-1,:]==0, axis=0)),
        lambda x: np.where(np.count_nonzero(x, axis=0)==0, -1, (x.shape[0]-1) - np.argmax(x[::-1,:]!=0, axis=0)),
    ],
    labels=["A.1", "A.2", "B", "C.1", "C.2"],
).show()

Those results show that you should go with approach C.1 or C.2 if the 2-D array is big enough. The threshold seems to be around 10x10 size:

So, it will depend on your data: if the size of the 2-D array is like the one you provided as an example (a 6x6 or less), you should consider using A.1 , A.2 or even the initially proposed approach B .

(In that case, further examination should be done in that size range to verify if it's invariant to hardware, etc.)

Find the index of last non-zero value per column of a 2-D array

Question

2 answers

solution1
1 2021-06-10 22:47:41

solution2
1 2021-06-14 02:54:33

Possible approaches

Performance comparison

Find the index of last non-zero value per column of a 2-D array

Question

2 answers

solution1 1 2021-06-10 22:47:41

solution2 1 2021-06-14 02:54:33

Possible approaches

Performance comparison

solution1
1 2021-06-10 22:47:41

solution2
1 2021-06-14 02:54:33