使用python中的数组索引切割2D数组

Question

我正在使用2D numpy数组的切片。 要选择切片，我将索引存储在数组中。 例如，我有：

mat = np.zeros([xdim,ydim], float)
xmin = np.array([...]) # Array of minimum indices in x
xmax = np.array([...]) # Array of maximum indices in x
ymin = np.array([...]) # Array of minimum indices in y
ymax = np.array([...]) # Array of maximum indices in y
value = np.array([...]) # Values

其中...仅表示先前计算的一些整数。 所有数组都是明确定义的，长度约为265000。 我想做的是：

mat[xmin:xmax, ymin:ymax] += value

以这种方式，对于我将拥有的第一个元素：

mat[xmin[0]:xmax[0], ymin[0]:ymax[0]] += value[0]
mat[xmin[1]:xmax[1], ymin[1]:ymax[1]] += value[1]

等等，对于数组的~265000个元素。 不幸的是，我刚刚编写的内容不起作用，它抛出错误： IndexError: invalid slice 。

我一直在尝试使用np.meshgrid如下所示： NumPy：在3D切片中使用np.meshgrid 2D索引数组，但它对我来说还没有用。 此外，我正在寻找一种pythonic方法，避免for循环。

任何帮助都感激不尽！

谢谢！

Answer 1

我不认为有一种令人满意的方式来渲染你的问题而不诉诸Cython等。 让我概述一下纯粹的numpy解决方案可能是什么样子，这应该清楚为什么这可能不是一个非常好的方法。

首先，让我们看一下1D案例。 你可以用numpy中的一堆切片做多少，所以第一个任务是将它们扩展为单独的索引。 假设您的阵列是：

mat = np.zeros((10,))
x_min = np.array([2, 5, 3, 1])
x_max = np.array([5, 9, 8, 7])
value = np.array([0.2, 0.6, 0.1, 0.9])

然后，下面的代码将切片限制扩展为（可能重复的）索引和值的列表，将它们与bincount连接在一起，并将它们添加到原始mat ：

x_len = x_max - x_min
x_cum_len = np.cumsum(x_len)
x_idx = np.arange(x_cum_len[-1])
x_idx[x_len[0]:] -= np.repeat(x_cum_len[:-1], x_len[1:])
x_idx += np.repeat(x_min, x_len)
x_val = np.repeat(value, x_len)
x_cumval = np.bincount(x_idx, weights=x_val)
mat[:len(x_cumval)] += x_cumval

>>> mat
array([ 0. ,  0.9,  1.1,  1.2,  1.2,  1.6,  1.6,  0.7,  0.6,  0. ])

可以将它扩展到你的2D情况，虽然它不是微不足道的，但事情开始变得难以理解：

mat = np.zeros((10, 10))
x_min = np.array([2, 5, 3, 1])
x_max = np.array([5, 9, 8, 7])
y_min = np.array([1, 7, 2, 6])
y_max = np.array([6, 8, 6, 9])
value = np.array([0.2, 0.6, 0.1, 0.9])

x_len = x_max - x_min
y_len = y_max - y_min
total_len = x_len * y_len
x_cum_len = np.cumsum(x_len)
x_idx = np.arange(x_cum_len[-1])
x_idx[x_len[0]:] -= np.repeat(x_cum_len[:-1], x_len[1:])
x_idx += np.repeat(x_min, x_len)
x_val = np.repeat(value, x_len)
y_min_ = np.repeat(y_min, x_len)
y_len_ = np.repeat(y_len, x_len)
y_cum_len = np.cumsum(y_len_)
y_idx = np.arange(y_cum_len[-1])
y_idx[y_len_[0]:] -= np.repeat(y_cum_len[:-1], y_len_[1:])
y_idx += np.repeat(y_min_, y_len_)
x_idx_ = np.repeat(x_idx, y_len_)
xy_val = np.repeat(x_val, y_len_)
xy_idx = np.ravel_multi_index((x_idx_, y_idx), dims=mat.shape)
xy_cumval = np.bincount(xy_idx, weights=xy_val)
mat.ravel()[:len(xy_cumval)] += xy_cumval

哪个产生：

>>> mat
array([[ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0.9,  0.9,  0.9,  0. ],
       [ 0. ,  0.2,  0.2,  0.2,  0.2,  0.2,  0.9,  0.9,  0.9,  0. ],
       [ 0. ,  0.2,  0.3,  0.3,  0.3,  0.3,  0.9,  0.9,  0.9,  0. ],
       [ 0. ,  0.2,  0.3,  0.3,  0.3,  0.3,  0.9,  0.9,  0.9,  0. ],
       [ 0. ,  0. ,  0.1,  0.1,  0.1,  0.1,  0.9,  1.5,  0.9,  0. ],
       [ 0. ,  0. ,  0.1,  0.1,  0.1,  0.1,  0.9,  1.5,  0.9,  0. ],
       [ 0. ,  0. ,  0.1,  0.1,  0.1,  0.1,  0. ,  0.6,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0.6,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ]])

但是如果你有265,000个任意大小的二维切片，那么索引数组将真正快速地进入数百万个项目。 必须处理读取和写入如此多的数据可以抵消使用numpy带来的速度提升。 坦率地说，我怀疑这是一个很好的选择，如果没有别的，因为你的代码会变得多么神秘。

使用python中的数组索引切割2D数组

问题描述

1 个解决方案

解决方案1
2 2014-03-03 06:34:36

使用python中的数组索引切割2D数组

问题描述

1 个解决方案

解决方案1 2 2014-03-03 06:34:36

解决方案1
2 2014-03-03 06:34:36