简体   繁体   中英

Selecting multiple slices from a numpy array at once

I'm looking for a way to select multiple slices from a numpy array at once. Say we have a 1D data array and want to extract three portions of it like below:

data_extractions = []

for start_index in range(0, 3):
    data_extractions.append(data[start_index: start_index + 5])

stride_tricks can do that

a = np.arange(10)
b = np.lib.stride_tricks.as_strided(a, (3, 5), 2 * a.strides)
b
# array([[0, 1, 2, 3, 4],
#        [1, 2, 3, 4, 5],
#        [2, 3, 4, 5, 6]])

Please note that b references the same memory as a , in fact multiple times (for example b[0, 1] and b[1, 0] are the same memory address). It is therefore safest to make a copy before working with the new structure.

nd can be done in a similar fashion, for example 2d -> 4d

a = np.arange(16).reshape(4, 4)
b = np.lib.stride_tricks.as_strided(a, (3,3,2,2), 2*a.strides)
b.reshape(9,2,2) # this forces a copy
# array([[[ 0,  1],
#         [ 4,  5]],

#        [[ 1,  2],
#         [ 5,  6]],

#        [[ 2,  3],
#         [ 6,  7]],

#        [[ 4,  5],
#         [ 8,  9]],

#        [[ 5,  6],
#         [ 9, 10]],

#        [[ 6,  7],
#         [10, 11]],

#        [[ 8,  9],
#         [12, 13]],

#        [[ 9, 10],
#         [13, 14]],

#        [[10, 11],
#         [14, 15]]])

You can use the indexes to select the rows you want into the appropriate shape. For example:

 data = np.random.normal(size=(100,2,2,2))

 # Creating an array of row-indexes
 indexes = np.array([np.arange(0,5), np.arange(1,6), np.arange(2,7)])
 # data[indexes] will return an element of shape (3,5,2,2,2). Converting
 # to list happens along axis 0
 data_extractions = list(data[indexes])

 np.all(data_extractions[1] == data[1:6])
 True

The final comparison is against the original data.

In this post is an approach with strided-indexing scheme using np.lib.stride_tricks.as_strided that basically creates a view into the input array and as such is pretty efficient for creation and being a view occupies nomore memory space. Also, this works for ndarrays with generic number of dimensions.

Here's the implementation -

def strided_axis0(a, L):
    # Store the shape and strides info
    shp = a.shape
    s  = a.strides

    # Compute length of output array along the first axis
    nd0 = shp[0]-L+1

    # Setup shape and strides for use with np.lib.stride_tricks.as_strided
    # and get (n+1) dim output array
    shp_in = (nd0,L)+shp[1:]
    strd_in = (s[0],) + s
    return np.lib.stride_tricks.as_strided(a, shape=shp_in, strides=strd_in)

Sample run for a 4D array case -

In [44]: a = np.random.randint(11,99,(10,4,2,3)) # Array

In [45]: L = 5      # Window length along the first axis

In [46]: out = strided_axis0(a, L)

In [47]: np.allclose(a[0:L], out[0])  # Verify outputs
Out[47]: True

In [48]: np.allclose(a[1:L+1], out[1])
Out[48]: True

In [49]: np.allclose(a[2:L+2], out[2])
Out[49]: True

You can slice your array with a prepared slicing array

a = np.array(list('abcdefg'))

b = np.array([
        [0, 1, 2, 3, 4],
        [1, 2, 3, 4, 5],
        [2, 3, 4, 5, 6]
    ])

a[b]

However, b doesn't have to generated by hand in this way. It can be more dynamic with

b = np.arange(5) + np.arange(3)[:, None]

In the general case you have to do some sort of iteration - and concatenation - either when constructing the indexes or when collecting the results. It's only when the slicing pattern is itself regular that you can use a generalized slicing via as_strided .

The accepted answer constructs an indexing array, one row per slice. So that is iterating over the slices, and arange itself is a (fast) iteration. And np.array concatenates them on a new axis ( np.stack generalizes this).

In [264]: np.array([np.arange(0,5), np.arange(1,6), np.arange(2,7)])
Out[264]: 
array([[0, 1, 2, 3, 4],
       [1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6]])

indexing_tricks convenience methods to do the same thing:

In [265]: np.r_[0:5, 1:6, 2:7]
Out[265]: array([0, 1, 2, 3, 4, 1, 2, 3, 4, 5, 2, 3, 4, 5, 6])

This takes the slicing notation, expands it with arange and concatenates. It even lets me expand and concatenate into 2d

In [269]: np.r_['0,2',0:5, 1:6, 2:7]
Out[269]: 
array([[0, 1, 2, 3, 4],
       [1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6]])

In [270]: data=np.array(list('abcdefghijk'))
In [272]: data[np.r_['0,2',0:5, 1:6, 2:7]]
Out[272]: 
array([['a', 'b', 'c', 'd', 'e'],
       ['b', 'c', 'd', 'e', 'f'],
       ['c', 'd', 'e', 'f', 'g']], 
      dtype='<U1')
In [273]: data[np.r_[0:5, 1:6, 2:7]]
Out[273]: 
array(['a', 'b', 'c', 'd', 'e', 'b', 'c', 'd', 'e', 'f', 'c', 'd', 'e',
       'f', 'g'], 
      dtype='<U1')

Concatenating results after indexing also works.

In [274]: np.stack([data[0:5],data[1:6],data[2:7]])

My memory from other SO questions is that relative timings are in the same order of magnitude. It may vary for example with the number of slices versus their length. Overall the number of values that have to be copied from source to target will be the same.

If the slices vary in length, you'd have to use the flat indexing.

No matter which approach you choose, if 2 slices contain same element, it doesn't support mathematical operations correctly unlesss you use ufunc.at<\/code> which can be more inefficient than loop. For testing:

def as_strides(arr, window_size, stride, writeable=False):
    '''Get a strided sub-matrices view of a 4D ndarray.

    Args:
        arr (ndarray): input array with shape (batch_size, m1, n1, c).
        window_size (tuple): with shape (m2, n2).
        stride (tuple): stride of windows in (y_stride, x_stride).
        writeable (bool): it is recommended to keep it False unless needed
    Returns:
        subs (view): strided window view, with shape (batch_size, y_nwindows, x_nwindows, m2, n2, c)

    See also numpy.lib.stride_tricks.sliding_window_view
    '''
    batch_size = arr.shape[0]
    m1, n1, c = arr.shape[1:]
    m2, n2 = window_size
    y_stride, x_stride = stride

    view_shape = (batch_size, 1 + (m1 - m2) // y_stride,
                  1 + (n1 - n2) // x_stride, m2, n2, c)
    strides = (arr.strides[0], y_stride * arr.strides[1],
               x_stride * arr.strides[2]) + arr.strides[1:]
    subs = np.lib.stride_tricks.as_strided(arr,
                                           view_shape,
                                           strides=strides,
                                           writeable=writeable)
    return subs


import numpy as np
np.random.seed(1)

Xs = as_strides(np.random.randn(1, 5, 5, 2), (3, 3), (2, 2), writeable=True)[0]
print('input\n0,0\n', Xs[0, 0])
np.add.at(Xs, np.s_[:], 5)
print('unbuffered sum output\n0,0\n', Xs[0,0])
np.add.at(Xs, np.s_[:], -5)
Xs = Xs + 5
print('normal sum output\n0,0\n', Xs[0, 0])

We can use list comprehension for this

data=np.array([1,2,3,4,5,6,7,8,9,10])
data_extractions=[data[b:b+5] for b in [1,2,3,4,5]]
data_extractions

Results

[array([2, 3, 4, 5, 6]), array([3, 4, 5, 6, 7]), array([4, 5, 6, 7, 8]), array([5, 6, 7, 8, 9]), array([ 6,  7,  8,  9, 10])]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM