简体   繁体   中英

Find the index of last non-zero value per column of a 2-D array

Given a 2-D array, x , I would like to find the indices of bottom-most non-zero element at each column. For example, if

x = np.array([[1, 2, 3, 4],
              [1, 5, 0, 1],
              [1, 0, 0, 1],
              [0, 0, 0, 0]])

the result should be [2, 1, 0, 2] .

One obvious way to do so is

 [col.nonzero()[0].max() for col in x.T]

I am, however, a bit concerned that the above may not be optimal in terms of performance. Are there any more efficient ways to do so?

Numpy does not provide a way to do that directly ( np.argmin and np.argmax find the first occurrence and not the last). Thus, a trick is needed.

The thing is finding the last non-zero value per column is similar to finding the first non-zero value per column if columns are flipped.

Here is an implementation using np.argmin to find the first:

(x.shape[0]-1) - np.argmax(x[::-1,:]!=0, axis=0)

You should ensure that there is at least one non-zero value per column for the result to be fine.

In order to establish what method is more efficient, it's important to compare how several implementations perform while the input data scales. (In that case, a 2-D array.)

For the sake of consistency, in case there's no element different than zero in the column, the returned index is -1.

The presented code [col.nonzero()[0].max() for col in xT] wasn't considering that special case, of a column only containing zeros, where col.nonzero() would return an empty array (and cause an exception while trying to use it).

Possible approaches

A.1: use only built-in Python functions, ignoring NumPy

[max(c) if (c:=[i for i, val in enumerate(col) if val]) else -1 for col in zip(*x)]

A.2: use built-in Python functions and NumPy transposed numpy.ndarray.T

[max(c) if (c:=[i for i, val in enumerate(col) if val]) else -1 for col in x.T]

B: use numpy.ndarray.nonzero function and the transposed array, but in a comprehension for each column

[c.max() if (c:=col.nonzero()[0]).size else -1 for col in x.T]

C.1: use no transposition nor comprehension, just the numpy.argmax function over the reversed columns

np.where(np.count_nonzero(x, axis=0)==0, -1, (x.shape[0]-1) - np.argmin(x[::-1,:]==0, axis=0))

C.2: use no transposition nor comprehension, just the numpy.argmin function over the reversed columns

np.where(np.count_nonzero(x, axis=0)==0, -1, (x.shape[0]-1) - np.argmax(x[::-1,:]!=0, axis=0)),

Performance comparison

In the following comparison, those cases are executed over an input data that increase exponentially in size.

The input data is a square 2-D NumPy array, and its elements are randomly generated integers, with 50% probability of been 0, and 50% of been any number from 1 to 9 (equially distriuted).

import numpy as np
import perfplot

perfplot.bench(
    n_range=[2**k for k in range(1, 14)],  # 2, 4, 8, 16, 32, ..., 2k, 4k, 8k
    setup=lambda n: np.random.choice(range(10), size=(n, n), p=[0.5]+[0.5/9]*9),
    kernels=[
        lambda x: [max(c) if (c:=[i for i, val in enumerate(col) if val]) else -1 for col in zip(*x)],
        lambda x: [max(c) if (c:=[i for i, val in enumerate(col) if val]) else -1 for col in x.T],
        lambda x: [c.max() if (c:=col.nonzero()[0]).size else -1 for col in x.T],
        lambda x: np.where(np.count_nonzero(x, axis=0)==0, -1, (x.shape[0]-1) - np.argmin(x[::-1,:]==0, axis=0)),
        lambda x: np.where(np.count_nonzero(x, axis=0)==0, -1, (x.shape[0]-1) - np.argmax(x[::-1,:]!=0, axis=0)),
    ],
    labels=["A.1", "A.2", "B", "C.1", "C.2"],
).show()

Those results show that you should go with approach C.1 or C.2 if the 2-D array is big enough. The threshold seems to be around 10x10 size:

So, it will depend on your data: if the size of the 2-D array is like the one you provided as an example (a 6x6 or less), you should consider using A.1 , A.2 or even the initially proposed approach B .

(In that case, further examination should be done in that size range to verify if it's invariant to hardware, etc.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM