[英]How to implement maxpool: taking a maximum on sliding window on image or tensor
In short: I am looking for a simple numpy
(maybe oneliner) implementation of Maxpool
- maximum on a window on numpy.narray
for all location of the window across dimensions. 总之:我在寻找一个简单的
numpy
(也许oneliner)实施Maxpool
-最大的窗口上numpy.narray
跨维度窗口的所有位置。
In more details: I am implementing a convolutional neural network ("CNN"), one of the typical layers in such a network is MaxPool
layer (look for example here ). 更详细地说:我正在实现一个卷积神经网络(“CNN”),这种网络中的典型层之一是
MaxPool
层( 在这里查看例子)。 Writing y = MaxPool(x, S)
, x
is an input narray
and S
is a parameter, using pseudocode, the output of the MaxPool
is given by: 写
y = MaxPool(x, S)
, x
是输入narray
, S
是参数,使用伪代码, MaxPool
的输出由下MaxPool
给出:
y[b,h,w,c] = max(x[b, s*h + i, s*w + j, c]) over i = 0,..., S-1; j = 0,...,S-1.
That is, y
is narray
where the value at indexes b,h,w,c
equals the maximum taken over the window of size S x S
along the second and the third dimension of the input x
, the window "corner" is placed at the indexes b,h,w,c
. 也就是说,
y
是narray
,其中索引b,h,w,c
等于沿着输入x
的第二维和第三维的大小S x S
的窗口所取的最大值,窗口“角”位于索引b,h,w,c
。
Some additional details: The network is implemented using numpy
. 一些额外的细节:网络是使用
numpy
实现的。 CNN has many "layers" where output of one layer is the input to the next layer. CNN有许多“层”,其中一层的输出是下一层的输入。 The input to a layers are
numpy.narray
s called "tensors". 层的输入是
numpy.narray
称为“张量”。 In my case tensors are 4-dimensional numpy.narray
's, x
. 在我的情况下,张量是4维
numpy.narray
的, x
。 That is x.shape
is a tuple (B,H,W,C)
. 那就是
x.shape
是一个元组(B,H,W,C)
。 Each size of dimensions changes after the tensor is process by a layer, for example the input to layer i= 4
can have size B = 10, H = 24, W = 24, C = 3
, while the output, aka input to i+1
layer has B = 10, H = 12, W = 12, C = 5
. 在张量处理层之后,每个尺寸的尺寸都会改变,例如,层
i= 4
的输入可以具有尺寸B = 10, H = 24, W = 24, C = 3
,而输出,也就是输入到i+1
层具有B = 10, H = 12, W = 12, C = 5
。 As indicated in the comments the size after application of MaxPool
is (B, H - S + 1, W - S + 1, C)
. 如评论中所示,应用
MaxPool
后的大小为(B, H - S + 1, W - S + 1, C)
。
For a concreteness: if I use 具体来说:如果我使用
import numpy as np
y = np.amax(x, axis = (1,2))
where x.shape
is say (2,3,3,4)
this will give me what I want but for a degenerate case where the window I am maximizing over is of the size 3 x 3
, the size of the second and third dimension of x
, which is not exactly what I want. 其中
x.shape
说(2,3,3,4)
这将给我我想要的但是对于一个退化的情况,我最大化的窗口是3 x 3
的大小,第二和第三维的大小x
,这不是我想要的。
Here's a solution using np.lib.stride_tricks.as_strided
to create sliding windows resulting in a 6D
array of shape : (B,H-S+1,W-S+1,S,S,C)
and then simply performing max along the fourth and fifth axes, resulting in an output array of shape : (B,H-S+1,W-S+1,C)
. 这是一个使用
np.lib.stride_tricks.as_strided
创建滑动窗口的解决方案,形成一个6D
阵列的形状: (B,H-S+1,W-S+1,S,S,C)
,然后简单地执行最大化第四和第五轴,产生一个形状的输出数组: (B,H-S+1,W-S+1,C)
。 The intermediate 6D
array would be a view into the input array and as such won't occupy anymore memory. 中间
6D
阵列将是输入数组的视图,因此不会占用更多的内存。 The subsequent operation of max
being a reduction would efficiently utilize the sliding views
. max
的后续操作是减少将有效地利用滑动views
。
Thus, an implementation would be - 因此,实施将是 -
# Based on http://stackoverflow.com/a/41850409/3293881
def patchify(img, patch_shape):
a, X, Y, b = img.shape
x, y = patch_shape
shape = (a, X - x + 1, Y - y + 1, x, y, b)
a_str, X_str, Y_str, b_str = img.strides
strides = (a_str, X_str, Y_str, X_str, Y_str, b_str)
return np.lib.stride_tricks.as_strided(img, shape=shape, strides=strides)
out = patchify(x, (S,S)).max(axis=(3,4))
Sample run - 样品运行 -
In [224]: x = np.random.randint(0,9,(10,24,24,3))
In [225]: S = 5
In [226]: np.may_share_memory(patchify(x, (S,S)), x)
Out[226]: True
In [227]: patchify(x, (S,S)).shape
Out[227]: (10, 20, 20, 5, 5, 3)
In [228]: patchify(x, (S,S)).max(axis=(3,4)).shape
Out[228]: (10, 20, 20, 3)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.