Given a 2-D array, x
, I would like to find the indices of bottom-most non-zero element at each column. For example, if
x = np.array([[1, 2, 3, 4],
[1, 5, 0, 1],
[1, 0, 0, 1],
[0, 0, 0, 0]])
the result should be [2, 1, 0, 2]
.
One obvious way to do so is
[col.nonzero()[0].max() for col in x.T]
I am, however, a bit concerned that the above may not be optimal in terms of performance. Are there any more efficient ways to do so?
Numpy does not provide a way to do that directly ( np.argmin
and np.argmax
find the first occurrence and not the last). Thus, a trick is needed.
The thing is finding the last non-zero value per column is similar to finding the first non-zero value per column if columns are flipped.
Here is an implementation using np.argmin
to find the first:
(x.shape[0]-1) - np.argmax(x[::-1,:]!=0, axis=0)
You should ensure that there is at least one non-zero value per column for the result to be fine.
In order to establish what method is more efficient, it's important to compare how several implementations perform while the input data scales. (In that case, a 2-D array.)
For the sake of consistency, in case there's no element different than zero in the column, the returned index is -1.
The presented code [col.nonzero()[0].max() for col in xT]
wasn't considering that special case, of a column only containing zeros, where col.nonzero()
would return an empty array (and cause an exception while trying to use it).
A.1: use only built-in Python functions, ignoring NumPy
[max(c) if (c:=[i for i, val in enumerate(col) if val]) else -1 for col in zip(*x)]
A.2: use built-in Python functions and NumPy transposed numpy.ndarray.T
[max(c) if (c:=[i for i, val in enumerate(col) if val]) else -1 for col in x.T]
B: use numpy.ndarray.nonzero
function and the transposed array, but in a comprehension for each column
[c.max() if (c:=col.nonzero()[0]).size else -1 for col in x.T]
C.1: use no transposition nor comprehension, just the numpy.argmax
function over the reversed columns
np.where(np.count_nonzero(x, axis=0)==0, -1, (x.shape[0]-1) - np.argmin(x[::-1,:]==0, axis=0))
C.2: use no transposition nor comprehension, just the numpy.argmin
function over the reversed columns
np.where(np.count_nonzero(x, axis=0)==0, -1, (x.shape[0]-1) - np.argmax(x[::-1,:]!=0, axis=0)),
In the following comparison, those cases are executed over an input data that increase exponentially in size.
The input data is a square 2-D NumPy array, and its elements are randomly generated integers, with 50% probability of been 0, and 50% of been any number from 1 to 9 (equially distriuted).
import numpy as np
import perfplot
perfplot.bench(
n_range=[2**k for k in range(1, 14)], # 2, 4, 8, 16, 32, ..., 2k, 4k, 8k
setup=lambda n: np.random.choice(range(10), size=(n, n), p=[0.5]+[0.5/9]*9),
kernels=[
lambda x: [max(c) if (c:=[i for i, val in enumerate(col) if val]) else -1 for col in zip(*x)],
lambda x: [max(c) if (c:=[i for i, val in enumerate(col) if val]) else -1 for col in x.T],
lambda x: [c.max() if (c:=col.nonzero()[0]).size else -1 for col in x.T],
lambda x: np.where(np.count_nonzero(x, axis=0)==0, -1, (x.shape[0]-1) - np.argmin(x[::-1,:]==0, axis=0)),
lambda x: np.where(np.count_nonzero(x, axis=0)==0, -1, (x.shape[0]-1) - np.argmax(x[::-1,:]!=0, axis=0)),
],
labels=["A.1", "A.2", "B", "C.1", "C.2"],
).show()
Those results show that you should go with approach C.1 or C.2 if the 2-D array is big enough. The threshold seems to be around 10x10 size:
So, it will depend on your data: if the size of the 2-D array is like the one you provided as an example (a 6x6 or less), you should consider using A.1 , A.2 or even the initially proposed approach B .
(In that case, further examination should be done in that size range to verify if it's invariant to hardware, etc.)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.