简体   繁体   English

M×N 形状的滑动窗口 numpy.ndarray

[英]Sliding window of M-by-N shape numpy.ndarray

I have a Numpy array of shape (6,2):我有一个形状为 (6,2) 的 Numpy 数组:

[[ 0, 1],
 [10,11],
 [20,21],
 [30,31],
 [40,41],
 [50,51]]

I need a sliding window with step size 1 and window size 3 like this:我需要一个步长为1和窗口大小为3的滑动窗口,如下所示:

[[ 0, 1,10,11,20,21],
 [10,11,20,21,30,31],
 [20,21,30,31,40,41],
 [30,31,40,41,50,51]]

I'm looking for a Numpy solution.我正在寻找一个 Numpy 解决方案。 If your solution could parametrise the shape of the original array as well as the window size and step size, that'd be great.如果您的解决方案可以参数化原始数组的形状以及窗口大小和步长,那就太好了。


I found this related answer Using strides for an efficient moving average filter but I don't see how to specify the stepsize there and how to collapse the window from the 3d to a continuous 2d array.我找到了这个相关的答案Using strides for an有效移动平均滤波器,但我没有看到如何在那里指定步长以及如何将窗口从 3d 折叠到连续的 2d 数组。 Also this Rolling or sliding window iterator?还有这个滚动或滑动窗口迭代器? but that's in Python and I'm not sure how efficient that is.但那是在 Python 中,我不确定它的效率如何。 Also, it supports elements but does not join them together in the end if each element has multiple features.此外,它支持元素,但如果每个元素具有多个特征,则最终不会将它们连接在一起。

You can do a vectorized sliding window in numpy using fancy indexing.您可以使用花哨的索引在 numpy 中进行矢量化滑动窗口。

>>> import numpy as np

>>> a = np.array([[00,01], [10,11], [20,21], [30,31], [40,41], [50,51]])

>>> a
array([[ 0,  1],
       [10, 11],
       [20, 21],                      #define our 2d numpy array
       [30, 31],
       [40, 41],
       [50, 51]])

>>> a = a.flatten()

>>> a
array([ 0,  1, 10, 11, 20, 21, 30, 31, 40, 41, 50, 51])    #flattened numpy array

>>> indexer = np.arange(6)[None, :] + 2*np.arange(4)[:, None]

>>> indexer
array([[ 0,  1,  2,  3,  4,  5],
       [ 2,  3,  4,  5,  6,  7],            #sliding window indices
       [ 4,  5,  6,  7,  8,  9],
       [ 6,  7,  8,  9, 10, 11]])

>>> a[indexer]
array([[ 0,  1, 10, 11, 20, 21],
       [10, 11, 20, 21, 30, 31],            #values of a over sliding window
       [20, 21, 30, 31, 40, 41],
       [30, 31, 40, 41, 50, 51]])

>>> np.sum(a[indexer], axis=1)
array([ 63, 123, 183, 243])         #sum of values in 'a' under the sliding window.

Explanation for what this code is doing.解释此代码正在做什么。

The np.arange(6)[None, :] creates a row vector 0 through 6, and np.arange(4)[:, None] creates a column vector 0 through 4. This results in a 4x6 matrix where each row (six of them) represents a window, and the number of rows (four of them) represents the number of windows. np.arange(6)[None, :]创建了一个从 0 到 6 的行向量,而np.arange(4)[:, None]创建了一个从 0 到 4 的列向量。这会产生一个 4x6 矩阵,其中每一行 (其中六个)代表一个窗口,行数(其中四个)代表窗口数。 The multiple of 2 makes the sliding window slide 2 units at a time which is necessary for sliding over each tuple. 2 的倍数使滑动窗口一次滑动 2 个单位,这是在每个元组上滑动所必需的。 Using numpy array slicing you can pass the sliding window into the flattened numpy array and do aggregates on them like sum.使用 numpy 数组切片,您可以将滑动窗口传递到展平的 numpy 数组中,并像 sum 一样对它们进行聚合。

In [1]: import numpy as np

In [2]: a = np.array([[00,01], [10,11], [20,21], [30,31], [40,41], [50,51]])

In [3]: w = np.hstack((a[:-2],a[1:-1],a[2:]))

In [4]: w
Out[4]: 
array([[ 0,  1, 10, 11, 20, 21],
       [10, 11, 20, 21, 30, 31],
       [20, 21, 30, 31, 40, 41],
       [30, 31, 40, 41, 50, 51]])

You could write this in as a function as so:你可以把它写成一个函数:

def window_stack(a, stepsize=1, width=3):
    n = a.shape[0]
    return np.hstack( a[i:1+n+i-width:stepsize] for i in range(0,width) )

This doesn't really depend on the shape of the original array, as long as a.ndim = 2 .这并不真正取决于原始数组的形状,只要a.ndim = 2 Note that I never use either lengths in the interactive version.请注意,我从未在交互式版本中使用任何一种长度。 The second dimension of the shape is irrelevant;形状的第二个维度无关紧要; each row can be as long as you want.每一行可以任意长。 Thanks to @Jaime's suggestion, you can do it without checking the shape at all:感谢@Jaime 的建议,您可以在不检查形状的情况下进行操作:

def window_stack(a, stepsize=1, width=3):
    return np.hstack( a[i:1+i-width or None:stepsize] for i in range(0,width) )

One solution is一种解决方案是

np.lib.stride_tricks.as_strided(a, shape=(4,6), strides=(8,4)) . np.lib.stride_tricks.as_strided(a, shape=(4,6), strides=(8,4))

Using strides is intuitive when you start thinking in terms of pointers/addresses.当您开始考虑指针/地址时,使用 strides 是很直观的。

The as_strided() method has 3 arguments. as_strided()方法有 3 个参数。

  1. data数据
  2. shape形状
  3. strides大步

data is the array on which we would operate. data是我们要操作的数组。

To use as_strided() for implementing sliding window functions, we must compute the shape of the output beforehand.要使用as_strided()实现滑动窗口函数,我们必须事先计算输出的形状。 In the question, (4,6) is the shape of output.在问题中,(4,6) 是输出的形状。 If the dimensions are not correct, we end up reading garbage values.如果维度不正确,我们最终会读取垃圾值。 This is because we are accessing data by moving the pointer by a couple of bytes (depending on data type).这是因为我们通过将指针移动几个字节来访问数据(取决于数据类型)。

Determining the correct value of strides is essential to get expected results.确定strides的正确值对于获得预期结果至关重要。 Before calculating strides, find out the memory occupied by each element using arr.strides[-1] .在计算arr.strides[-1]之前,使用arr.strides[-1]找出每个元素占用的内存。 In this example, the memory occupied by one element is 4 bytes.在本例中,一个元素占用的内存为 4 个字节。 Numpy arrays are created in row major fashion. Numpy 数组以行主要方式创建。 The first element of the next row is right next to the last element of the current row.下一行的第一个元素紧挨着当前行的最后一个元素。

Ex:前任:

0 , 1 | 10, 11 | ...

10 is right next to 1. 10 紧挨着 1。

Imagine the 2D array reshaped to 1D (This is acceptable as the data is stored in a row-major format).想象一下将 2D 数组重塑为 1D(这是可以接受的,因为数据以行优先格式存储)。 The first element of each row in the output is the odd indexed element in the 1D array.输出中每一行的第一个元素是一维数组中的奇数索引元素。

0, 10, 20, 30, ..

Therefore, the number of steps in memory we need to take to move from 0 to 10, 10 to 20, and so on is 2 * mem size of element .因此,我们需要在内存中从 0 到 10、10 到 20 等移动的步骤数是2 * mem size of element Each row has a stride of 2 * 4bytes = 8 .每行的步幅为2 * 4bytes = 8 For a given row in the output, all the elements are adjacent to each other in our imaginary 1D array.对于输出中的给定行,所有元素在我们虚构的一维数组中彼此相邻。 To get the next element in a row, just take one stride equal to the size of an element.要获取一行中的下一个元素,只需与元素大小相同的步幅即可。 The value of column stride is 4 bytes.列跨度的值为 4 个字节。

Therefore, strides=(8,4)因此, strides=(8,4)

An alternate explanation: The output has a shape of (4,6).另一种解释:输出的形状为 (4,6)。 Column stride 4 .列步幅4 So, the first row elements start at index 0 and have 6 elements each spaced 4 bytes apart.因此,第一行元素从索引0开始,有 6 个元素,每个元素间隔 4 个字节。 After the first row is collected, the second row starts 8 bytes away from the starting of the current row.收集完第一行后,第二行开始,距离当前行的起始位置 8 个字节。 The third row starts 8 bytes away from the starting point of the second row and so on.第三行从第二行的起点开始 8 个字节,依此类推。

The shape determines the number of rows and columns we need.形状决定了我们需要的行数和列数。 strides define the memory steps to start a row and collect a column element strides 定义开始一行和收集列元素的内存步骤

A short list comprehension is possible with more_itertools.windowed 1 :使用more_itertools.windowed 1可以进行简短的列表理解:

Given给定的

import numpy as np
import more_itertools as mit


a = [["00","01"],
     ["10","11"],
     ["20","21"],
     ["30","31"],
     ["40","41"],
     ["50","51"]]

b = np.array(a)

Code代码

np.array([list(mit.flatten(w)) for w in mit.windowed(a, n=3)])

or要么

np.array([[i for item in w for i in item] for w in mit.windowed(a, n=3)])

or要么

np.array(list(mit.windowed(b.ravel(), n=6)))

Output输出

array([['00', '01', '10', '11', '20', '21'],
       ['10', '11', '20', '21', '30', '31'],
       ['20', '21', '30', '31', '40', '41'],
       ['30', '31', '40', '41', '50', '51']], 
      dtype='<U2')

Sliding windows of size n=3 are created and flattened.创建并展平大小为n=3滑动窗口。 Note the default step size is more_itertools.windowed(..., step=1) .注意默认步长是more_itertools.windowed(..., step=1)


Performance表现

As an array, the accepted answer is fastest.作为数组,接受的答案是最快的。

%timeit np.hstack((a[:-2], a[1:-1], a[2:]))
# 37.5 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.hstack((b[:-2], b[1:-1], b[2:]))
# 12.9 µs ± 166 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit np.array([list(mit.flatten(w)) for w in mit.windowed(a, n=3)])
# 23.2 µs ± 1.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.array([[i for item in w for i in item] for w in mit.windowed(a, n=3)])
# 21.2 µs ± 999 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.array(list(mit.windowed(b.ravel(), n=6)))
# 43.4 µs ± 374 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

A third-party library that implements itertool recipes and many helpful tools.实现itertool 配方和许多有用工具的第三方库。

Starting in Numpy 1.20 , using the new sliding_window_view to slide/roll over windows of elements, and based on the same idea as user42541's answer , we can do:Numpy 1.20开始,使用新的sliding_window_view来滑动/滚动元素窗口,基于与user42541 的回答相同的想法,我们可以这样做:

import numpy as np
from numpy.lib.stride_tricks import sliding_window_view

# values = np.array([[0,1], [10,11], [20,21], [30,31], [40,41], [50,51]])
sliding_window_view(values.flatten(), window_shape = 2*3)[::2]
# array([[ 0,  1, 10, 11, 20, 21],
#        [10, 11, 20, 21, 30, 31],
#        [20, 21, 30, 31, 40, 41],
#        [30, 31, 40, 41, 50, 51]])

where 2 is the size of sub-arrays and 3 the window.其中2是子数组的大小, 3是窗口。


Details of the intermediate steps:中间步骤的详细信息:

# values = np.array([[0,1], [10,11], [20,21], [30,31], [40,41], [50,51]])

# Flatten the array (concatenate sub-arrays):
values.flatten()
# array([ 0,  1, 10, 11, 20, 21, 30, 31, 40, 41, 50, 51])

# Slide through windows of size 2*3=6:
sliding_window_view(values.flatten(), 2*3)
# array([[ 0,  1, 10, 11, 20, 21],
#        [ 1, 10, 11, 20, 21, 30],
#        [10, 11, 20, 21, 30, 31],
#        [11, 20, 21, 30, 31, 40],
#        [20, 21, 30, 31, 40, 41],
#        [21, 30, 31, 40, 41, 50],
#        [30, 31, 40, 41, 50, 51]])

# Only keep even rows (1 row in 2 - if sub-arrays have a size of x, then replace 2 with x):
sliding_window_view(values.flatten(), 2*3)[::2]
# array([[ 0,  1, 10, 11, 20, 21],
#        [10, 11, 20, 21, 30, 31],
#        [20, 21, 30, 31, 40, 41],
#        [30, 31, 40, 41, 50, 51]])

As of NumPy version 1.20.0 this can be done using从 NumPy 版本1.20.0这可以使用

np.lib.stride_tricks.sliding_window_view(arr, winsize)

Example:例子:

>>> arr = np.arange(0, 9).reshape((3, 3))
>>> np.lib.stride_tricks.sliding_window_view(arr, (2, 2))

array([[[[0, 1],
         [3, 4]],

        [[1, 2],
         [4, 5]]],


       [[[3, 4],
         [6, 7]],

        [[4, 5],
         [7, 8]]]])

You can read more about it here .您可以在此处阅读更多相关信息。

Here is One-liner using Numpy >= v1.17这是使用 Numpy >= v1.17 的 One-liner

rowsJoined = 3

splits = np.vstack(np.split(x,np.array([[i, i + rowsJoined] for i in range(x.shape[0] - (rowsJoined - 1))]).reshape(-1))).reshape(-1, rowsJoined * x.shape[1]) 

Test测试

x = np.array([[00,1],
              [10,11],
              [20,21],
              [30,31],
              [40,41],
              [50,51]])

Result结果

[[ 0  1 10 11 20 21]
 [10 11 20 21 30 31]
 [20 21 30 31 40 41]
 [30 31 40 41 50 51]]

Test Performance On Large Array在大型阵列上测试性能

import numpy as np
import time

x = np.array(range(1000)).reshape(-1, 2)
rowsJoined = 3

all_t = 0.
for i in range(1000):
    start_ = time.time()
    np.vstack(
        numpy.split(x,np.array([[i, i + rowsJoined] for i in range(x.shape[0] - (rowsJoined - 1))])
                    .reshape(-1))).reshape(-1, rowsJoined * x.shape[1])
    all_t += time.time() - start_

print('Average Time of 1000 Iterations on Array of Shape '
      '1000 x 2 is: {} Seconds.'.format(all_t/1000.))

Performance Result表现结果

Average Time of 1000 Iterations on Array of Shape 1000 x 2 is: 0.0016909 Seconds.

This is a pure Python implementation:这是一个纯 Python 实现:

def sliding_window(arr, window=3):
    i = iter(arr)
    a = []
    for e in range(0, window): a.append(next(i))
    yield a
    for e in i:
        a = a[1:] + [e]
        yield a

An example:一个例子:

# flatten array
flatten = lambda l: [item for sublist in l for item in sublist]

a = [[0,1], [10,11], [20,21], [30,31], [40,41], [50,51]]
w = sliding_window(a, width=3)
print( list(map(flatten,w)) )

[[0, 1, 10, 11, 20, 21], [10, 11, 20, 21, 30, 31], [20, 21, 30, 31, 40, 41], [30, 31, 40, 41, 50, 51]]

Benchmark基准

import timeit
def benchmark():
  a = [[0,1], [10,11], [20,21], [30,31], [40,41], [50,51]]
  sliding_window(a, width=3)

times = timeit.Timer(benchmark).repeat(3, number=1000)
time_taken = min(times) / 1000
print(time_taken)

1.0944640007437556e-06

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM