简体   繁体   中英

Alternative to loop for for boolean / nonzero indexing of numpy array

I need to select only the non-zero 3d portions of a 3d binary array (or alternatively the true values of a boolean array). Currently I am able to do so with a series of 'for' loops that use np.any, but this does work but seems awkward and slow, so currently investigating a more direct way to accomplish the task.

I am rather new to numpy, so the approaches that I have tried include a) using np.nonzero , which returns indices that I am at a loss to understand what to do with for my purposes, b) boolean array indexing , and c) boolean masks . I can generally understand each of those approaches for simple 2d arrays, but am struggling to understand the differences between the approaches, and cannot get them to return the right values for a 3d array.

Here is my current function that returns a 3D array with nonzero values:

def real_size(arr3):
    true_0 = []
    true_1 = []
    true_2 = []
    print(f'The input array shape is: {arr3.shape}')

    for zero_ in range (0, arr3.shape[0]):
        if arr3[zero_].any()==True:
            true_0.append(zero_)
    for one_ in range (0, arr3.shape[1]):
        if arr3[:,one_,:].any()==True:
            true_1.append(one_)
    for two_ in range (0, arr3.shape[2]):
        if arr3[:,:,two_].any()==True:
            true_2.append(two_)

    arr4 = arr3[min(true_0):max(true_0) + 1, min(true_1):max(true_1) + 1, min(true_2):max(true_2) + 1]
    print(f'The nonzero area is: {arr4.shape}')
    return arr4

# Then use it on a small test array:
test_array = np.zeros([2, 3, 4], dtype = int)
test_array[0:2, 0:2, 0:2] = 1

#The function call works and prints out as expected:
non_zero = real_size(test_array)
>> The input array shape is: (2, 3, 4) 
>> The nonzero area is: (2, 2, 2)

# So, the array is correct, but likely not the best way to get there:
non_zero

>> array([[[1, 1],
        [1, 1]],

       [[1, 1],
        [1, 1]]])

The code works appropriately, but I am using this on much larger and more complex arrays, and don't think this is an appropriate approach. Any thoughts on a more direct method to make this work would be greatly appreciated. I am also concerned about errors and the results if the input array has for example two separate non-zero 3d areas within the original array.

To clarify the problem, I need to return one or more 3D portions as one or more 3d arrays beginning with an original larger array. The returned arrays should not include extraneous zeros (or false values) in any given exterior plane in three dimensional space. Just getting the indices of the nonzero values (or vice versa) doesn't by itself solve the problem.

Assuming you want to eliminate all rows, columns, etc. that contain only zeros, you could do the following:

nz = (test_array != 0)
non_zero = test_array[nz.any(axis=(1, 2))][:, nz.any(axis=(0, 2))][:, :, nz.any(axis=(0, 1))]

An alternative solution using np.nonzero :

i = [np.unique(_) for _ in np.nonzero(test_array)]
non_zero = test_array[i[0]][:, i[1]][:, :, i[2]]

This can also be generalized to arbitrary dimensions, but requires a bit more work (only showing the first approach here):

def real_size(arr):
    nz = (arr != 0)
    result = arr
    axes = np.arange(arr.ndim)
    for axis in range(arr.ndim):
        zeros = nz.any(axis=tuple(np.delete(axes, axis)))
        result = result[(slice(None),)*axis + (zeros,)]
    return result

non_zero = real_size(test_array)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM