提高高度重复的 numpy 数组索引操作的性能

Question

In my program code I've got numpy value arrays and numpy index arrays.在我的程序代码中，我有numpy值数组和numpy索引数组。 Both are preallocated and predefined during program initialization.两者都是在程序初始化期间预先分配和预定义的。
Each part of the program has one array values on which calculations are performed, and three index arrays idx_from_exch , idx_values and idx_to_exch .程序的每一部分都有一个用于执行计算的数组values ，以及三个索引数组idx_from_exch 、 idx_values和idx_to_exch 。 There is on global value array to exchange the values of several parts: exch_arr .有一个全局值数组来交换几个部分的值： exch_arr 。
The index arrays have between 2 and 5 indices most of the times, seldomly (most probably never) more indices are needed.大多数情况下，索引数组有 2 到 5 个索引，很少（很可能永远不会）需要更多索引。 dtype=np.int32 , shape and values are constant during the whole program run. dtype=np.int32 ， shape和值在整个程序运行期间是恒定的。 Thus I set ndarray.flags.writeable=False after initialization, but this is optional.因此我在初始化后设置ndarray.flags.writeable=False ，但这是可选的。 The index values of the index arrays idx_values and idx_to_exch are sorted in numerical order, idx_source may be sorted , but there is no way to define that.索引数组idx_values和idx_to_exch的索引值按数字顺序排序， idx_source可以排序，但无法定义。 All index arrays corresponding to one value array/part have the same shape .对应于一个值数组/部分的所有索引数组具有相同的shape 。
The values arrays and also the exch_arr usually have between 50 and 1000 elements. values数组和exch_arr通常有 50 到 1000 个元素。 shape and dtype=np.float64 are constant during the whole program run, the values of the arrays change in each iteration. shape和dtype=np.float64在整个程序运行过程中保持不变，数组的值在每次迭代中都会发生变化。
Here are the example arrays:以下是示例数组：

import numpy as np
import numba as nb

values = np.random.rand(100) * 100  # just some random numbers
exch_arr = np.random.rand(60) * 3  # just some random numbers
idx_values = np.array((0, 4, 55, -1), dtype=np.int32)  # sorted but varying steps
idx_to_exch = np.array((7, 8, 9, 10), dtype=np.int32)  # sorted and constant steps!
idx_from_exch = np.array((19, 4, 7, 43), dtype=np.int32)  # not sorted and varying steps

The example indexing operations look like this:示例索引操作如下所示：

values[idx_values] = exch_arr[idx_from_exch]  # get values from exchange array
values *= 1.1  # some inplace array operations, this is just a dummy for more complex things
exch_arr[idx_to_exch] = values[idx_values]  # pass some values back to exchange array

Since these operations are being applied once per iteration for several million iterations, speed is crucial.由于这些操作在几百万次迭代中每次迭代应用一次，因此速度至关重要。 I've been looking into many different ways of increasing indexing speed in my previous question , but forgot to be specific enough considering my application (especially getting values by indexing with constant index arrays and passing them to another indexed array). 在我之前的问题中，我一直在研究提高索引速度的许多不同方法，但是考虑到我的应用程序（尤其是通过使用常量索引数组进行索引并将它们传递给另一个索引数组来获取值），我忘记了足够具体。
The best way to do it seems to be fancy indexing so far.到目前为止，最好的方法似乎是花哨的索引。 I'm currently also experimenting with numba guvectorize , but it seems that it is not worth the effort since my arrays are quite small.我目前也在试验numba guvectorize ，但似乎不值得付出努力，因为我的数组很小。 memoryviews would be nice, but since the index arrays do not necessarily have consistent steps, I know of no way to use memoryviews . memoryviews会很好，但由于索引数组不一定具有一致的步骤，我知道没有办法使用memoryviews 。

So is there any faster way to do repeated indexing?那么有没有更快的方法来进行重复索引？ Some way of predefining memory address arrays for each indexing operation, as dtype and shape are always constant?为每个索引操作预定义内存地址数组的某种方法，因为dtype和shape总是恒定的？ ndarray.__array_interface__ gave me a memory address, but I wasn't able to use it for indexing. ndarray.__array_interface__给了我一个内存地址，但我无法将它用于索引。 I thought about something like:我想过这样的事情：

stride_exch = exch_arr.strides[0]
mem_address = exch_arr.__array_interface__['data'][0]
idx_to_exch = idx_to_exch * stride_exch + mem_address

Is that feasible?那可行吗？
I've also been looking into using strides directly with as_strided , but as far as I know only consistent strides are allowed and my problem would require inconsistent strides .我也一直在寻找到使用strides直接与as_strided ，但据我所知，只有一致的步伐是允许的，我的问题就需要不一致strides 。

Any help is appreciated!任何帮助表示赞赏！ Thanks in advance!提前致谢！

edit:编辑：
I just corrected a massive error in my example calculation!我刚刚在我的示例计算中纠正了一个巨大的错误！
The operation values = values * 1.1 changes the memory address of the array.操作values = values * 1.1更改数组的内存地址。 All my operations in the program code are layed out to not change the memory address of the arrays, because alot of other operations rely on using memoryviews.我在程序代码中的所有操作都不会改变数组的内存地址，因为很多其他操作都依赖于使用内存视图。 Thus I replaced the dummy operation with the correct in-place operation: values *= 1.1因此，我用正确的就地操作替换了虚拟操作： values *= 1.1

Answer 1

One solution to getting round expensive fancy indexing using numpy boolean arrays is using numba and skipping over the False values in your numpy boolean array.使用 numpy 布尔数组绕过昂贵的花哨索引的一种解决方案是使用 numba 并跳过 numpy 布尔数组中的 False 值。

Example implementation:示例实现：

@numba.guvectorize(['float64[:], float64[:,:], float64[:]'], '(n),(m,n)->(m)', nopython=True, target="cpu")
def test_func(arr1, arr2, inds, res):
    for i in range(arr1.shape[0]):
        if not inds[i]:
            continue
        for j in range(arr2.shape[0]):
            res[j, i] = arr1[i] + arr2[j, i]

Of course, play around with the numpy data types (smaller byte sizes will run faster) and target being "cpu" or "parallel" .当然，使用 numpy 数据类型（较小的字节大小会运行得更快）并且目标是"cpu"或"parallel" 。

提高高度重复的 numpy 数组索引操作的性能

问题描述

1 个解决方案

解决方案1
0 2022-01-01 17:42:07

提高高度重复的 numpy 数组索引操作的性能

问题描述

1 个解决方案

解决方案1 0 2022-01-01 17:42:07

解决方案1
0 2022-01-01 17:42:07