Increasing performance of highly repeated numpy array index operations

Question

In my program code I've got numpy value arrays and numpy index arrays. Both are preallocated and predefined during program initialization.
Each part of the program has one array values on which calculations are performed, and three index arrays idx_from_exch , idx_values and idx_to_exch . There is on global value array to exchange the values of several parts: exch_arr .
The index arrays have between 2 and 5 indices most of the times, seldomly (most probably never) more indices are needed. dtype=np.int32 , shape and values are constant during the whole program run. Thus I set ndarray.flags.writeable=False after initialization, but this is optional. The index values of the index arrays idx_values and idx_to_exch are sorted in numerical order, idx_source may be sorted , but there is no way to define that. All index arrays corresponding to one value array/part have the same shape .
The values arrays and also the exch_arr usually have between 50 and 1000 elements. shape and dtype=np.float64 are constant during the whole program run, the values of the arrays change in each iteration.
Here are the example arrays:

import numpy as np
import numba as nb

values = np.random.rand(100) * 100  # just some random numbers
exch_arr = np.random.rand(60) * 3  # just some random numbers
idx_values = np.array((0, 4, 55, -1), dtype=np.int32)  # sorted but varying steps
idx_to_exch = np.array((7, 8, 9, 10), dtype=np.int32)  # sorted and constant steps!
idx_from_exch = np.array((19, 4, 7, 43), dtype=np.int32)  # not sorted and varying steps

The example indexing operations look like this:

values[idx_values] = exch_arr[idx_from_exch]  # get values from exchange array
values *= 1.1  # some inplace array operations, this is just a dummy for more complex things
exch_arr[idx_to_exch] = values[idx_values]  # pass some values back to exchange array

Since these operations are being applied once per iteration for several million iterations, speed is crucial. I've been looking into many different ways of increasing indexing speed in my previous question , but forgot to be specific enough considering my application (especially getting values by indexing with constant index arrays and passing them to another indexed array).
The best way to do it seems to be fancy indexing so far. I'm currently also experimenting with numba guvectorize , but it seems that it is not worth the effort since my arrays are quite small. memoryviews would be nice, but since the index arrays do not necessarily have consistent steps, I know of no way to use memoryviews .

So is there any faster way to do repeated indexing? Some way of predefining memory address arrays for each indexing operation, as dtype and shape are always constant? ndarray.__array_interface__ gave me a memory address, but I wasn't able to use it for indexing. I thought about something like:

stride_exch = exch_arr.strides[0]
mem_address = exch_arr.__array_interface__['data'][0]
idx_to_exch = idx_to_exch * stride_exch + mem_address

Is that feasible?
I've also been looking into using strides directly with as_strided , but as far as I know only consistent strides are allowed and my problem would require inconsistent strides .

Any help is appreciated! Thanks in advance!

edit:
I just corrected a massive error in my example calculation!
The operation values = values * 1.1 changes the memory address of the array. All my operations in the program code are layed out to not change the memory address of the arrays, because alot of other operations rely on using memoryviews. Thus I replaced the dummy operation with the correct in-place operation: values *= 1.1

Answer 1

One solution to getting round expensive fancy indexing using numpy boolean arrays is using numba and skipping over the False values in your numpy boolean array.

Example implementation:

@numba.guvectorize(['float64[:], float64[:,:], float64[:]'], '(n),(m,n)->(m)', nopython=True, target="cpu")
def test_func(arr1, arr2, inds, res):
    for i in range(arr1.shape[0]):
        if not inds[i]:
            continue
        for j in range(arr2.shape[0]):
            res[j, i] = arr1[i] + arr2[j, i]

Of course, play around with the numpy data types (smaller byte sizes will run faster) and target being "cpu" or "parallel" .

Increasing performance of highly repeated numpy array index operations

Question

1 answers

solution1
0 2022-01-01 17:42:07

Increasing performance of highly repeated numpy array index operations

Question

1 answers

solution1 0 2022-01-01 17:42:07

solution1
0 2022-01-01 17:42:07