简体   繁体   中英

Multi-step linear algebra operations on multiple numpy arrays

I have 3 numpy arrays:

import numpy
arr_a = numpy.random.random((300, 300))
arr_b = numpy.random.random((300, 300))
arr_c = numpy.random.random((300, 300))

I want to create a 4th array (arr_d) from a combination of the 3 arrays. The rules are as follows:

if arr_a grid cell value > 0.2 and arr_b value < 0.4 and arr_c > 0.6 then fill arr_d with 1

if arr_a grid cell value > 0.3 and arr_b value < 0.5 and arr_c > 0.6 then fill arr_d with 2

if arr_a grid cell value > 0.1 and arr_b value < 0.2 and arr_c > 0.5 then fill arr_d with 3

In all other cases fill arr_d with 4

I can do this using nested loops, but that is very slow and not very pythonic. Also, this is a test case, the real arrays are of size 1000 * 1000 so I want a scaleable solution preferably parallizable.

Using pure Python and for loops is definitely not the way to go. You can write your program using array operations in NumPy, effectively doing the looping in C, speeding up the code enormously. This however instantiates a whole new array for each of your rules, each with the same size as your data. Instead you could use something like Numba, which comes eg with the Anaconda distribution of Python. With Numba you can write your code using loops, but without the time penalty (it compiles your code to native machine instructions). Also, no additional large arrays are needed, making it much more memory efficient than NumPy. Numba also happens to be faster, as this example shows:

import numpy, numba, time

def using_numpy(shape):
    arr_a = numpy.random.random(shape)
    arr_b = numpy.random.random(shape)
    arr_c = numpy.random.random(shape)
    mask1 = numpy.logical_and(numpy.logical_and((arr_a > 0.2), (arr_b < 0.4)), (arr_c > 0.6))
    mask2 = numpy.logical_and(numpy.logical_and((arr_a > 0.3), (arr_b < 0.5)), (arr_c > 0.6))
    mask3 = numpy.logical_and(numpy.logical_and((arr_a > 0.1), (arr_b < 0.2)), (arr_c > 0.5))
    result = numpy.ones(arr_a.shape)*4
    result[mask1] = 1
    result[mask2] = 2
    result[mask3] = 3
    return result

def using_numba(shape):
    arr_a = numpy.random.random(shape)
    arr_b = numpy.random.random(shape)
    arr_c = numpy.random.random(shape)
    result = numpy.empty(shape)
    for i in range(result.shape[0]):
        for j in range(result.shape[1]):
            if arr_a[i, j] > 0.2 and arr_b[i, j] < 0.4 and arr_c[i, j] > 0.6:
                result[i, j] = 1
            elif arr_a[i, j] > 0.3 and arr_b[i, j] < 0.5 and arr_c[i, j] > 0.6:
                result[i, j] = 2
            elif arr_a[i, j] > 0.1 and arr_b[i, j] < 0.2 and arr_c[i, j] > 0.5:
                result[i, j] = 3
                result[i, j] = 4
    return result
# Compile the using_numba function
using_numba((0, 0))

t0 = time.time()
result = using_numpy((3000, 3000))
print('NumPy took', time.time() - t0, 'seconds')

t0 = time.time()
result = using_numba((3000, 3000))
print('Numba took', time.time() - t0, 'seconds')

Here I have used (3000, 3000) arrays. On my machine, using NumPy takes 0.47 seconds while using Numba takes 0.29 seconds.

One way would be to use boolean maps

condition_1 = numpy.logical_and(numpy.logical_and((arr_a > 0.2), (arr_b < 0.4)), (arr_c > 0.6))
condition_2 = numpy.logical_and(numpy.logical_and((arr_a > 0.3), (arr_b < 0.5)), (arr_c > 0.6))
condition_3 = numpy.logical_and(numpy.logical_and((arr_a > 0.1), (arr_b < 0.2)), (arr_c > 0.5))
result = numpy.ones((300, 300)) * 4
result[numpy.where(condition_3)] = 3
result[numpy.where(condition_2)] = 2
result[numpy.where(condition_1)] = 1

It avoids nested loops, but allocates three dedicated arrays and makes a lot of superfluous assignments. There has to be a more optimal approach...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM