简体   繁体   English

使用切片的numpy数组索引numpy数组

[英]Indexing a numpy array using a numpy array of slices

(Edit: I wrote a solution basing on hpaulj's answer, see code at the bottom of this post) (编辑:我根据hpaulj的回答写了一个解决方案,请参阅本文底部的代码)

I wrote a function that subdivides an n-dimensional array into smaller ones such that each of the subdivisions has max_chunk_size elements in total. 我编写了一个函数,将n维数组细分为较小的数组,这样每个细分总计共有max_chunk_size元素。

Since I need to subdivide many arrays of same shapes and then perform operations on the corresponding chunks, it doesn't actually operate on the data rather than creates an array of "indexers", ie an array of (slice(x1, x2), slice(y1, y2), ...) objects (see the code below). 由于我需要细分许多相同形状的数组,然后对相应的块执行操作,因此它实际上并不对数据进行操作,而是创建了“索引器”数组,即(slice(x1, x2), slice(y1, y2), ...)对象(请参见下面的代码)。 With these indexers I can retrieve subdivisions by calling the_array[indexer[i]] (see examples below). 使用这些索引器,我可以通过调用the_array[indexer[i]]来检索细分(请参见下面的示例)。

Also, the array of these indexers has same number of dimensions as input and divisions are aligned along corresponding axes, ie blocks the_array[indexer[i,j,k]] and the_array[indexer[i+1,j,k]] are adjusent along the 0-axis, etc. 同样,这些索引器的数组具有与输入相同的维数,并且划分沿相应的轴对齐,即块the_array[indexer[i,j,k]]the_array[indexer[i+1,j,k]]分别为沿0轴等距

I was expecting that I should also be able to concatenate these blocks by calling the_array[indexer[i:i+2,j,k]] and that the_array[indexer] would return just the_array , however such calls result in an error: 我期望通过调用the_array[indexer[i:i+2,j,k]]也可以连接这些块,并且the_array[indexer]仅返回the_array ,但是这样的调用会导致错误:

IndexError: arrays used as indices must be of integer (or boolean) type IndexError:用作索引的数组必须为整数(或布尔值)类型

Is there a simple way around this error? 有没有解决此错误的简单方法?

Here's the code: 这是代码:

import numpy as np
import itertools

def subdivide(shape, max_chunk_size=500000):
    shape = np.array(shape).astype(float)
    total_size = shape.prod()

    # calculate maximum slice shape:
    slice_shape = np.floor(shape * min(max_chunk_size / total_size, 1.0)**(1./len(shape))).astype(int)

    # create a list of slices for each dimension:
    slices = [[slice(left, min(right, n)) \
      for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \
        for n, step_size in zip(shape.astype(int), slice_shape)]

    result = np.empty(reduce(lambda a,b:a*len(b), slices, 1), dtype=np.object)
    for i, el in enumerate(itertools.product(*slices)): result[i] = el
    result.shape = np.ceil(shape / slice_shape).astype(int)
    return result

Here's an example usage: 这是一个示例用法:

>>> ar = np.arange(90).reshape(6,15)
>>> ar
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
       [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])

>>> slices = subdivide(ar.shape, 16)
>>> slices
array([[(slice(0, 2, None), slice(0, 6, None)),
        (slice(0, 2, None), slice(6, 12, None)),
        (slice(0, 2, None), slice(12, 15, None))],
       [(slice(2, 4, None), slice(0, 6, None)),
        (slice(2, 4, None), slice(6, 12, None)),
        (slice(2, 4, None), slice(12, 15, None))],
       [(slice(4, 6, None), slice(0, 6, None)),
        (slice(4, 6, None), slice(6, 12, None)),
        (slice(4, 6, None), slice(12, 15, None))]], dtype=object)

>>> ar[slices[1,0]]
array([[30, 31, 32, 33, 34, 35],
       [45, 46, 47, 48, 49, 50]])
>>> ar[slices[0,2]]
array([[12, 13, 14],
       [27, 28, 29]])
>>> ar[slices[2,1]]
array([[66, 67, 68, 69, 70, 71],
       [81, 82, 83, 84, 85, 86]])

>>> ar[slices[:2,1:3]]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: arrays used as indices must be of integer (or boolean) type

Here's a solution based on hpaulj's answer: 这是基于hpaulj答案的解决方案:

import numpy as np
import itertools

class Subdivision():
    def __init__(self, shape, max_chunk_size=500000):
        shape = np.array(shape).astype(float)
        total_size = shape.prod()

        # calculate maximum slice shape:
        slice_shape = np.floor(shape * min(max_chunk_size / total_size, 1.0)**(1./len(shape))).astype(int)

        # create a list of slices for each dimension:
        slices = [[slice(left, min(right, n)) \
          for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \
            for n, step_size in zip(shape.astype(int), slice_shape)]

        self.slices = \
            np.array(list(itertools.product(*slices)), \
                     dtype=np.object).reshape(tuple(np.ceil(shape / slice_shape).astype(int)) + (len(shape),))

    def __getitem__(self, args):
        if type(args) != tuple: args = (args,)

        # turn integer index into equivalent slice
        args = tuple(slice(arg, arg + 1 if arg != -1 else None) if type(arg) == int else arg for arg in args)

        # select the slices
        # always select all elements from the last axis (which contains slices for each data dimension)
        slices = self.slices[args + ((slice(None),) if Ellipsis in args else (Ellipsis, slice(None)))]

        return np.ix_(*tuple(np.r_[tuple(slices[tuple([0] * i + [slice(None)] + \
                                                      [0] * (len(slices.shape) - 2 - i) + [i])])] \
                                for i in range(len(slices.shape) - 1)))

Example usage: 用法示例:

>>> ar = np.arange(90).reshape(6,15)
>>> ar
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
       [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])

>>> subdiv = Subdivision(ar.shape, 16)
>>> ar[subdiv[...]]
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
       [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])

>>> ar[subdiv[0]]
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

>>> ar[subdiv[:2,1]]
array([[ 6,  7,  8,  9, 10, 11],
       [21, 22, 23, 24, 25, 26],
       [36, 37, 38, 39, 40, 41],
       [51, 52, 53, 54, 55, 56]])

>>> ar[subdiv[2,:3]]
array([[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
       [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])

>>> ar[subdiv[...,:2]]
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41],
       [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71],
       [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86]])

Your slices produce 2x6 and 2x3 arrays. 您的切片产生2x6和2x3阵列。

In [36]: subslice=slices[:2,1:3]
In [37]: subslice[0,0]
Out[37]: array([slice(0, 2, None), slice(6, 12, None)], dtype=object)

In [38]: ar[tuple(subslice[0,0])]
Out[38]: 
array([[ 6,  7,  8,  9, 10, 11],
       [21, 22, 23, 24, 25, 26]])

My numpy version expects me to turn the subslice into a tuple. 我的numpy版本希望我将subslice转换为元组。 This is the same as 这和

ar[slice(0,2), slice(6,12)]
ar[:2, 6:12]

That's just the basic syntax of indexing and slicing. 那只是索引和切片的基本语法。 ar is 2d, so ar[(i,j)] requires a 2 element tuple - of slices, lists, arrays, or integers. ar是2d,所以ar[(i,j)]需要2个元素元组-切片,列表,数组或整数。 It won't work with an array of slice objects. 它不适用于切片对象数组。

How ever it is possible to concatenate the results into a larger array. 如何将结果串联成更大的数组。 That can be done after indexing or the slices can be converted into indexing lists. 可以在建立索引之后完成,也可以将切片转换为索引列表。

np.bmat for example concatenates together a 2d arangement of arrays: 例如, np.bmat将二维排列的数组连接在一起:

In [42]: np.bmat([[ar[tuple(subslice[0,0])], ar[tuple(subslice[0,1])]], 
                  [ar[tuple(subslice[1,0])],ar[tuple(subslice[1,1])]]])
Out[42]: 
matrix([[ 6,  7,  8,  9, 10, 11, 12, 13, 14],
        [21, 22, 23, 24, 25, 26, 27, 28, 29],
        [36, 37, 38, 39, 40, 41, 42, 43, 44],
        [51, 52, 53, 54, 55, 56, 57, 58, 59]])

You could generalize this. 您可以对此进行概括。 It just uses hstack and vstack on the nested lists. 它只在嵌套列表上使用hstackvstack The result is np.matrix but can be converted back to array . 结果为np.matrix但可以转换回array

The other approach is to use tools like np.arange , np.r_ , np.xi_ to create index arrays. 另一种方法是使用np.arangenp.r_np.xi_类的工具来创建索引数组。 It'll take some playing around to generate an example. 需要花一些时间来产生一个示例。

To combine the [0,0] and [0,1] subslices: 合并[0,0]和[0,1]子切片:

In [64]: j = np.r_[subslice[0,0,1],subslice[0,1,1]]
In [65]: i = np.r_[subslice[0,0,0]]

In [66]: i,j
Out[66]: (array([0, 1]), array([ 6,  7,  8,  9, 10, 11, 12, 13, 14]))
In [68]: ix = np.ix_(i,j)
In [69]: ix
Out[69]: 
(array([[0],
        [1]]), array([[ 6,  7,  8,  9, 10, 11, 12, 13, 14]]))

In [70]: ar[ix]
Out[70]: 
array([[ 6,  7,  8,  9, 10, 11, 12, 13, 14],
       [21, 22, 23, 24, 25, 26, 27, 28, 29]])

Or with i = np.r_[subslice[0,0,0], subslice[1,0,0]] , ar[np.ix_(i,j)] produces the 4x9 array. 或使用i = np.r_[subslice[0,0,0], subslice[1,0,0]]ar[np.ix_(i,j)]产生4x9数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM