简体   繁体   English

如何优化多维 numpy 数组计算?

[英]How to optimise multi dimension numpy array calculation?

Given a 5D array, the objective is to calculate the difference between the extracted two arrays.给定一个 5D 数组,目标是计算提取的两个 arrays 之间的差异。 For simplicity, say to measure the difference at the second position which can be denoted as bt and lf .为简单起见,假设测量第二个 position 的差异,可以表示为btlf The value from these two array can be extracted as follows:这两个数组的值可以提取如下:

arr[ep, bt, mt, bd, :] - arr[ep, lf, mt, bd, :]

Note that in the above, the index for the first ( ep ), third ( mt ) and fourth ( bd ) axes are the same for both of the arrays, with only the position index of second axis differing ( bt and lf ).请注意,在上面,arrays 的第一个( ep )、第三个( mt )和第四个( bd )轴的索引相同,只有第二个轴的 position 索引不同( btlf )。

Based on this requirement, the following code was proposed, and pack under the function nested_for_loop :基于此需求,提出如下代码,并打包在 function nested_for_loop下:

import numpy as np
from joblib import Parallel, delayed
np.random.seed(0)

ub_lb_pair = np.tril_indices (5, -1)

arr = np.random.randn(3, 5, 4, 3, 2)
my_shape = arr.shape

def nested_for_loop():
    store_cval = np.full([my_shape[0], 10, my_shape[2], my_shape[3], my_shape[4]],
                         np.nan)  # preallocate
    for ep in range(0, my_shape[0]):
        for mt in range(0, my_shape[2]):
            for bd in range(0, my_shape[3]):
                for idx,(bt, lf) in enumerate(zip(ub_lb_pair[0], ub_lb_pair[1])):
                    store_cval[ep, idx, mt, bd, :] = arr[ep, bt, mt, bd, :] - \
                                                     arr[ep, lf, mt, bd, :]
    return store_cval


store_cval = nested_for_loop()

However, I would like to make the code much more compact and efficient if possible.但是,如果可能的话,我想让代码更加紧凑和高效。

One approach I can think of is take advantage of the joblib parallel module, which can be achieve as below as shown under the function multi_prop .我能想到的一种方法是利用joblib parallel模块,它可以如下实现,如下所示 function multi_prop

def multi_prop(my_arr, ep):
    temp_ = np.full([10, my_shape[2], my_shape[3], my_shape[4]],
                    np.nan)
    for mt in range(0, my_shape[2]):
        for bd in range(0, my_shape[3]):
            for idx, (bt, lf) in enumerate(zip(ub_lb_pair[0], ub_lb_pair[1])):
                temp_[idx, mt, bd, :] = my_arr[ep, bt, mt, bd, :] - my_arr[ep, lf, mt, bd, :]
                x = 1
    return  temp_

dist_abs = Parallel(n_jobs=-1)(delayed(multi_prop)(arr, ep) for ep in range(0, my_shape[0]))

dist_abs = np.array(dist_abs)
bb = np.array_equal(store_cval, dist_abs)

But, I wonder whether the is a more numpythonic way to achieve the same objective.但是,我想知道这是否是实现相同目标的更 numpythonic 方式。

You don't really need any loops at all.你根本不需要任何循环。 Imagine this pair of fancy indices:想象一下这对花哨的指数:

bt, lf = np.tril_indices (5, -1)

You are looking for你正在寻找

store_cval = arr[:, bt] - arr[:, lf]

Keep in mind that store_cval[ep, idx, mt, bd, :] = arr[ep, bt, mt, bd, :] - arr[ep, lf, mt, bd, :] is an implicit loop over the last index.请记住, store_cval[ep, idx, mt, bd, :] = arr[ep, bt, mt, bd, :] - arr[ep, lf, mt, bd, :]是对最后一个索引的隐式循环. They're all loops, and you don't need any of them over the hood.它们都是循环,你不需要它们中的任何一个。

A more general solution:更通用的解决方案:

def diffs(arr, axis):
    a, b = np.tril_indices(arr.shape[axis], -1)
    ind1 = [slice(None) for _ in range(arr.ndim)]
    ind2 = ind1.copy()
    ind1[axis] = a
    ind2[axis] = b
    return arr[tuple(ind1)] - arr[tuple(ind2)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM