如何实现maxpool：在图像或张量上的滑动窗口上取最大值

Question

In short: I am looking for a simple numpy (maybe oneliner) implementation of Maxpool - maximum on a window on numpy.narray for all location of the window across dimensions. 总之：我在寻找一个简单的numpy （也许oneliner）实施Maxpool -最大的窗口上numpy.narray跨维度窗口的所有位置。

In more details: I am implementing a convolutional neural network ("CNN"), one of the typical layers in such a network is MaxPool layer (look for example here ). 更详细地说：我正在实现一个卷积神经网络（“CNN”），这种网络中的典型层之一是MaxPool层（在这里查看例子）。 Writing y = MaxPool(x, S) , x is an input narray and S is a parameter, using pseudocode, the output of the MaxPool is given by: 写y = MaxPool(x, S) ， x是输入narray ， S是参数，使用伪代码， MaxPool的输出由下MaxPool给出：

     y[b,h,w,c] = max(x[b, s*h + i, s*w + j, c]) over i = 0,..., S-1; j = 0,...,S-1.

That is, y is narray where the value at indexes b,h,w,c equals the maximum taken over the window of size S x S along the second and the third dimension of the input x , the window "corner" is placed at the indexes b,h,w,c . 也就是说， y是narray ，其中索引b,h,w,c等于沿着输入x的第二维和第三维的大小S x S的窗口所取的最大值，窗口“角”位于索引b,h,w,c 。

Some additional details: The network is implemented using numpy . 一些额外的细节：网络是使用numpy实现的。 CNN has many "layers" where output of one layer is the input to the next layer. CNN有许多“层”，其中一层的输出是下一层的输入。 The input to a layers are numpy.narray s called "tensors". 层的输入是numpy.narray称为“张量”。 In my case tensors are 4-dimensional numpy.narray 's, x . 在我的情况下，张量是4维numpy.narray的， x 。 That is x.shape is a tuple (B,H,W,C) . 那就是x.shape是一个元组(B,H,W,C) 。 Each size of dimensions changes after the tensor is process by a layer, for example the input to layer i= 4 can have size B = 10, H = 24, W = 24, C = 3 , while the output, aka input to i+1 layer has B = 10, H = 12, W = 12, C = 5 . 在张量处理层之后，每个尺寸的尺寸都会改变，例如，层i= 4的输入可以具有尺寸B = 10, H = 24, W = 24, C = 3 ，而输出，也就是输入到i+1层具有B = 10, H = 12, W = 12, C = 5 。 As indicated in the comments the size after application of MaxPool is (B, H - S + 1, W - S + 1, C) . 如评论中所示，应用MaxPool后的大小为(B, H - S + 1, W - S + 1, C) 。

For a concreteness: if I use 具体来说：如果我使用

import numpy as np

y = np.amax(x, axis = (1,2))

where x.shape is say (2,3,3,4) this will give me what I want but for a degenerate case where the window I am maximizing over is of the size 3 x 3 , the size of the second and third dimension of x , which is not exactly what I want. 其中x.shape说(2,3,3,4)这将给我我想要的但是对于一个退化的情况，我最大化的窗口是3 x 3的大小，第二和第三维的大小x ，这不是我想要的。

Answer 1

Here's a solution using np.lib.stride_tricks.as_strided to create sliding windows resulting in a 6D array of shape : (B,H-S+1,W-S+1,S,S,C) and then simply performing max along the fourth and fifth axes, resulting in an output array of shape : (B,H-S+1,W-S+1,C) . 这是一个使用np.lib.stride_tricks.as_strided创建滑动窗口的解决方案，形成一个6D阵列的形状： (B,H-S+1,W-S+1,S,S,C) ，然后简单地执行最大化第四和第五轴，产生一个形状的输出数组： (B,H-S+1,W-S+1,C) 。 The intermediate 6D array would be a view into the input array and as such won't occupy anymore memory. 中间6D阵列将是输入数组的视图，因此不会占用更多的内存。 The subsequent operation of max being a reduction would efficiently utilize the sliding views . max的后续操作是减少将有效地利用滑动views 。

Thus, an implementation would be - 因此，实施将是 -

# Based on http://stackoverflow.com/a/41850409/3293881
def patchify(img, patch_shape):
    a, X, Y, b = img.shape
    x, y = patch_shape
    shape = (a, X - x + 1, Y - y + 1, x, y, b)
    a_str, X_str, Y_str, b_str = img.strides
    strides = (a_str, X_str, Y_str, X_str, Y_str, b_str)
    return np.lib.stride_tricks.as_strided(img, shape=shape, strides=strides)

out = patchify(x, (S,S)).max(axis=(3,4))

Sample run - 样品运行 -

In [224]: x = np.random.randint(0,9,(10,24,24,3))

In [225]: S = 5

In [226]: np.may_share_memory(patchify(x, (S,S)), x)
Out[226]: True

In [227]: patchify(x, (S,S)).shape
Out[227]: (10, 20, 20, 5, 5, 3)

In [228]: patchify(x, (S,S)).max(axis=(3,4)).shape
Out[228]: (10, 20, 20, 3)

如何实现maxpool：在图像或张量上的滑动窗口上取最大值

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-01-26 20:35:44

如何实现maxpool：在图像或张量上的滑动窗口上取最大值

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-01-26 20:35:44

解决方案1
3 已采纳 2017-01-26 20:35:44