Keras Conv2D和输入通道

Question

The Keras layer documentation specifies the input and output sizes for convolutional layers: https://keras.io/layers/convolutional/ Keras层文档指定了卷积层的输入和输出大小： https ： //keras.io/layers/convolutional/

Input shape: (samples, channels, rows, cols) 输入形状:( (samples, channels, rows, cols)

Output shape: (samples, filters, new_rows, new_cols) 输出形状:( (samples, filters, new_rows, new_cols)

And the kernel size is a spatial parameter, ie detemines only width and height. 并且内核大小是空间参数，即仅确定宽度和高度。

So an input with c channels will yield an output with filters channels regardless of the value of c . 因此，无论c的值如何，具有c通道的输入都将产生具有filters通道的输出。 It must therefore apply 2D convolution with a spatial height x width filter and then aggregate the results somehow for each learned filter. 因此，它必须应用具有空间height x width过滤器的2D卷积，然后以某种方式聚合每个学习过滤器的结果。

What is this aggregation operator? 这个聚合运算符是什么？ is it a summation across channels? 它是跨渠道的总结吗？ can I control it? 我能控制它吗？ I couldn't find any information on the Keras documentation. 我找不到关于Keras文档的任何信息。

Note that in TensorFlow the filters are specified in the depth channel as well: https://www.tensorflow.org/api_guides/python/nn#Convolution , So the depth operation is clear. 请注意，在TensorFlow中，过滤器也在深度通道中指定： https ： //www.tensorflow.org/api_guides/python/nn#Convolution ，因此深度操作很明确。

Thanks. 谢谢。

Answer 1

It might be confusing that it is called Conv2D layer (it was to me, which is why I came looking for this answer), because as Nilesh Birari commented: 可能令人困惑的是，它被称为Conv2D层（这对我而言，这就是为什么我来寻找这个答案），因为正如Nilesh Birari评论的那样：

I guess you are missing it's 3D kernel [width, height, depth]. 我猜你错过了它的3D内核[宽度，高度，深度]。 So the result is summation across channels. 因此，结果是跨渠道的总和。

Perhaps the 2D stems from the fact that the kernel only slides along two dimensions, the third dimension is fixed and determined by the number of input channels (the input depth). 也许2D源于内核仅沿两个维度滑动的事实，第三维度是固定的并且由输入通道的数量（输入深度）确定。

For a more elaborate explanation, read https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/ 有关更详细的说明，请阅读https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/

I plucked an illustrative image from there: 我从那里采集了一个说明性的图像：

Answer 2

I was also wondering this, and found another answer here , where it is stated (emphasis mine): 我也想知道这一点，并在这里找到另一个答案，其中说明了（强调我的）：

Maybe the most tangible example of a multi-channel input is when you have a color image which has 3 RGB channels. 也许多通道输入最明显的例子是当你有一个有3个RGB通道的彩色图像时。 Let's get it to a convolution layer with 3 input channels and 1 output channel. 让我们把它带到一个带有3个输入通道和1个输出通道的卷积层。 (...) What it does is that it calculates the convolution of each filter with its corresponding input channel (...). （...）它的作用是计算每个滤波器与其相应输入通道（...）的卷积。 The stride of all channels are the same, so they output matrices with the same size. 所有通道的步幅都相同，因此它们输出大小相同的矩阵。 Now, it sums up all matrices and output a single matrix which is the only channel at the output of the convolution layer. 现在， 它对所有矩阵求和并输出单个矩阵，该矩阵是卷积层输出的唯一通道 。

Illustration: 插图：

Notice that the weights of the convolution kernels for each channel are different , which are then iteratively adjusted in the back-propagation steps by eg gradient decent based algorithms such as stochastic gradient descent (SDG). 注意， 每个通道的卷积核的权重是不同的 ，然后通过例如基于梯度体等的算法（例如随机梯度下降（SDG））在反向传播步骤中迭代地调整。

Here is a more technical answer from TensorFlow API . 以下是TensorFlow API的更多技术答案。

Answer 3

I also needed to convince myself so I ran a simple example with a 3×3 RGB image. 我还需要说服自己，所以我用3×3 RGB图像运行了一个简单的例子。

# red    # green        # blue
1 1 1    100 100 100    10000 10000 10000
1 1 1    100 100 100    10000 10000 10000    
1 1 1    100 100 100    10000 10000 10000

The filter is initialised to ones: 过滤器初始化为：

1 1
1 1

I have also set the convolution to have these properties: 我还设置了卷积以具有以下属性：

no padding 没有填充
strides = 1 步幅= 1
relu activation function relu激活功能
bias initialised to 0 偏差初始化为0

We would expect the (aggregated) output to be: 我们希望（聚合）输出为：

40404 40404
40404 40404

Also, from the picture above, the no. 另外，从上图中可以看出。 of parameters is 参数是

3 separate filters (one for each channel) × 4 weights + 1 (bias, not shown) = 13 parameters 3个独立的滤波器（每个通道一个）×4个权重+ 1（偏置，未显示）= 13个参数

Here's the code. 这是代码。

Import modules: 导入模块：

import numpy as np
from keras.layers import Input, Conv2D
from keras.models import Model

Create the red, green and blue channels: 创建红色，绿色和蓝色通道：

red   = np.array([1]*9).reshape((3,3))
green = np.array([100]*9).reshape((3,3))
blue  = np.array([10000]*9).reshape((3,3))

Stack the channels to form an RGB image: 堆叠通道以形成RGB图像：

img = np.stack([red, green, blue], axis=-1)
img = np.expand_dims(img, axis=0)

Create a model that just does a Conv2D convolution: 创建一个只进行Conv2D卷积的模型：

inputs = Input((3,3,3))
conv = Conv2D(filters=1, 
              strides=1, 
              padding='valid', 
              activation='relu',
              kernel_size=2, 
              kernel_initializer='ones', 
              bias_initializer='zeros', )(inputs)
model = Model(inputs,conv)

Input the image in the model: 在模型中输入图像：

model.predict(img)
# array([[[[40404.],
#          [40404.]],

#         [[40404.],
#          [40404.]]]], dtype=float32)

Run a summary to get the number of params: 运行摘要以获取参数的数量：

model.summary()

Keras Conv2D和输入通道

问题描述

3 个解决方案

解决方案1
23 已采纳 2017-07-12 10:26:18

解决方案2
18 2018-05-13 14:36:03

解决方案3
10 2018-12-26 09:37:59

Keras Conv2D和输入通道

问题描述

3 个解决方案

解决方案1 23 已采纳 2017-07-12 10:26:18

解决方案2 18 2018-05-13 14:36:03

解决方案3 10 2018-12-26 09:37:59

解决方案1
23 已采纳 2017-07-12 10:26:18

解决方案2
18 2018-05-13 14:36:03

解决方案3
10 2018-12-26 09:37:59