简体   繁体   English

使用python中的数组索引切割2D数组

[英]Slicing 2D arrays using indices from arrays in python

I'm working with slices of a 2D numpy array. 我正在使用2D numpy数组的切片。 To select the slices, I have the indices stored in arrays. 要选择切片,我将索引存储在数组中。 For example, I have: 例如,我有:

mat = np.zeros([xdim,ydim], float)
xmin = np.array([...]) # Array of minimum indices in x
xmax = np.array([...]) # Array of maximum indices in x
ymin = np.array([...]) # Array of minimum indices in y
ymax = np.array([...]) # Array of maximum indices in y
value = np.array([...]) # Values

Where ... just denotes some integer numbers previously calculated. 其中...仅表示先前计算的一些整数。 All arrays are well-defined and have lengths of ~265000. 所有数组都是明确定义的,长度约为265000。 What I want to do is something like: 我想做的是:

mat[xmin:xmax, ymin:ymax] += value

In such a way that for the first elements I would have: 以这种方式,对于我将拥有的第一个元素:

mat[xmin[0]:xmax[0], ymin[0]:ymax[0]] += value[0]
mat[xmin[1]:xmax[1], ymin[1]:ymax[1]] += value[1]

and so on, for the ~265000 elements of the array. 等等,对于数组的~265000个元素。 Unfortunately what I just wrote is not working, and it is throwing the error: IndexError: invalid slice . 不幸的是,我刚刚编写的内容不起作用,它抛出错误: IndexError: invalid slice

I've been trying to use np.meshgrid as suggested here: NumPy: use 2D index array from argmin in a 3D slice , but it hasn't worked for me yet. 我一直在尝试使用np.meshgrid如下所示: NumPy:在3D切片中使用np.meshgrid 2D索引数组 ,但它对我来说还没有用 Besides, I'm looking for a pythonic way to do so, avoiding the for loops. 此外,我正在寻找一种pythonic方法,避免for循环。

Any help will be much appreciated! 任何帮助都感激不尽!

Thanks! 谢谢!

I don't think there is a satisfactory way of vectorizing your problem without resorting to Cython or the like. 我不认为有一种令人满意的方式来渲染你的问题而不诉诸Cython等。 Let me outline what a pure numpy solution could look like, which should make clear why this is probably not a very good approach. 让我概述一下纯粹的numpy解决方案可能是什么样子,这应该清楚为什么这可能不是一个非常好的方法。

First, lets look at a 1D case. 首先,让我们看一下1D案例。 There's not much you can do with a bunch of slices in numpy, so the first task is to expand them into individual indices. 你可以用numpy中的一堆切片做多少,所以第一个任务是将它们扩展为单独的索引。 Say that your arrays were: 假设您的阵列是:

mat = np.zeros((10,))
x_min = np.array([2, 5, 3, 1])
x_max = np.array([5, 9, 8, 7])
value = np.array([0.2, 0.6, 0.1, 0.9])

Then the following code expands the slice limits into lists of (possibly repeating) indices and values, joins them together with bincount , and adds them to the original mat : 然后,下面的代码将切片限制扩展为(可能重复的)索引和值的列表,将它们与bincount连接在一起,并将它们添加到原始mat

x_len = x_max - x_min
x_cum_len = np.cumsum(x_len)
x_idx = np.arange(x_cum_len[-1])
x_idx[x_len[0]:] -= np.repeat(x_cum_len[:-1], x_len[1:])
x_idx += np.repeat(x_min, x_len)
x_val = np.repeat(value, x_len)
x_cumval = np.bincount(x_idx, weights=x_val)
mat[:len(x_cumval)] += x_cumval

>>> mat
array([ 0. ,  0.9,  1.1,  1.2,  1.2,  1.6,  1.6,  0.7,  0.6,  0. ])

It is possible to expand this to your 2D case, although it is anything but trivial, and things start getting hard to follow: 可以将它扩展到你的2D情况,虽然它不是微不足道的,但事情开始变得难以理解:

mat = np.zeros((10, 10))
x_min = np.array([2, 5, 3, 1])
x_max = np.array([5, 9, 8, 7])
y_min = np.array([1, 7, 2, 6])
y_max = np.array([6, 8, 6, 9])
value = np.array([0.2, 0.6, 0.1, 0.9])

x_len = x_max - x_min
y_len = y_max - y_min
total_len = x_len * y_len
x_cum_len = np.cumsum(x_len)
x_idx = np.arange(x_cum_len[-1])
x_idx[x_len[0]:] -= np.repeat(x_cum_len[:-1], x_len[1:])
x_idx += np.repeat(x_min, x_len)
x_val = np.repeat(value, x_len)
y_min_ = np.repeat(y_min, x_len)
y_len_ = np.repeat(y_len, x_len)
y_cum_len = np.cumsum(y_len_)
y_idx = np.arange(y_cum_len[-1])
y_idx[y_len_[0]:] -= np.repeat(y_cum_len[:-1], y_len_[1:])
y_idx += np.repeat(y_min_, y_len_)
x_idx_ = np.repeat(x_idx, y_len_)
xy_val = np.repeat(x_val, y_len_)
xy_idx = np.ravel_multi_index((x_idx_, y_idx), dims=mat.shape)
xy_cumval = np.bincount(xy_idx, weights=xy_val)
mat.ravel()[:len(xy_cumval)] += xy_cumval

Which produces: 哪个产生:

>>> mat
array([[ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0.9,  0.9,  0.9,  0. ],
       [ 0. ,  0.2,  0.2,  0.2,  0.2,  0.2,  0.9,  0.9,  0.9,  0. ],
       [ 0. ,  0.2,  0.3,  0.3,  0.3,  0.3,  0.9,  0.9,  0.9,  0. ],
       [ 0. ,  0.2,  0.3,  0.3,  0.3,  0.3,  0.9,  0.9,  0.9,  0. ],
       [ 0. ,  0. ,  0.1,  0.1,  0.1,  0.1,  0.9,  1.5,  0.9,  0. ],
       [ 0. ,  0. ,  0.1,  0.1,  0.1,  0.1,  0.9,  1.5,  0.9,  0. ],
       [ 0. ,  0. ,  0.1,  0.1,  0.1,  0.1,  0. ,  0.6,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0.6,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ]])

But if you have 265,000 two dimensional slices of arbitrary size, then the indexing arrays are going to get into the many millions of items really fast. 但是如果你有265,000个任意大小的二维切片,那么索引数组将真正快速地进入数百万个项目。 Having to handle reading and writing so much data can negate the speed improvements that come with using numpy. 必须处理读取和写入如此多的数据可以抵消使用numpy带来的速度提升。 Frankly, I doubt this is a good option at all, if nothing else because of how cryptic your code is going to become. 坦率地说,我怀疑这是一个很好的选择,如果没有别的,因为你的代码会变得多么神秘。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM