简体繁体 English

卷积层（CNN）在keras中如何工作？

[英]How do Convolutional Layers (CNNs) work in keras?

原文 2019-02-16 21:00:04 8 1 python/ keras/ neural-network/ conv-neural-network

I notice that in the keras documentation there are many different types of Conv layers, ie Conv1D , Conv2D , Conv3D . 我注意到在keras文档中，有许多不同类型的Conv图层，即Conv1D ， Conv2D ， Conv3D 。

All of them have parameters like filters , kernel_size , strides , and padding , which aren't present in other keras layers. 个个都像参数filters ， kernel_size ， strides ，和padding ，这是不存在于其它keras层。

I have seen images like this which "visualize" Conv layers, 我看到过这样的图像，可以“可视化” Conv图层，

but I don't understand what's going on in the transition from one layer to the next. 但我不了解从一层到下一层的过渡过程。

How do changing the above parameters and the dimensions of our Conv layer affect what happens in the model? 更改以上参数和Conv图层的尺寸如何影响模型中发生的情况？

1 个解决方案

Convolutions - Language Agnostic Basics 卷积-语言不可知基础

To understand how convolutions work in keras we need a basic understanding of how convolutions work in a language-agnostic setting. 要了解卷积如何在喀拉拉邦工作，我们需要对卷积如何在语言不可知的环境中进行基本了解。

Convolutional layers slide across an input to build an activation map (also called feature map). 卷积层在输入上滑动以构建激活图（也称为要素图）。 The above is an example of a 2D convolution. 以上是2D卷积的示例。 Notice how at each step, the 3 X 3 dark square slides across the input (blue), and for each new 3 x 3 part of the input it analyzes, it outputs a value in our output activation map (blue-green boxes at the top). 请注意，在每个步骤中，3 X 3深色正方形如何在输入上滑动（蓝色），并针对其分析的输入的每个新3 x 3部分，在输出激活图中输出一个值（最佳）。

Kernels and filters 内核和过滤器

The dark square is our kernel . 黑暗的方块是我们的kernel 。 The kernel is a matrix of weights that is multiplied with each part of our input. kernel是权重矩阵，乘以输入的每个部分。 All the results from these multiplications put together form our activation map. 这些乘法的所有结果汇总在一起形成我们的激活图。

Intuitively, our kernel lets us reuse parameters - a weights matrix that detects an eye in this part of the image will work to detect it elsewhere; 直观地，我们的kernel使我们可以重用参数-在图像的此部分中检测到眼睛的权重矩阵将在其他位置检测到它； there's no point in training different parameters for each part of our input when one kernel can sweep across and work everywhere. 当一个kernel可以跨越并在任何地方工作时，没有必要为输入的每个部分训练不同的参数。 We can consider each kernel as a feature-detector for one feature, and it's output activation map as a map of how likely that feature is present in each part of your input. 我们可以将每个kernel视为一个功能的功能检测器，并将其输出激活图视为该功能出现在输入的每个部分中的可能性的图。

A synonym for kernel is filter . kernel的同义词是filter 。 The parameter filters is asking for the number of kernels (feature-detectors) in that Conv layer. 参数filters要求该Conv层中的kernels （功能检测器）数量。 This number will also be the the size of the last dimension in your output, ie filters=10 will result in an output shape of (???, 10) . 此数字也将是输出中最后一个尺寸的大小，即filters=10将导致输出形状为(???, 10) 。 This is because the output of each Conv layer is a set of activation maps, and there will be filters number of activation maps. 这是因为每个Conv层的输出是一组激活图，并且将有filters数目的激活图。

Kernel Size 内核大小

The kernel_size is well, the size for each kernel. kernel_size很好，每个内核的大小。 We discussed earlier that each kernel consists of a weights matrix that is tuned to detect certain features better and better. 前面我们讨论了每个kernel包含一个权重矩阵，该矩阵经过调整可以更好地检测某些特征。 kernel_size dictates the size of filter mask. kernel_size指示过滤器掩码的大小。 In English, how much "input" is processed during each convolution . 用英语， 每次卷积过程中要处理多少“输入” 。 For example our above diagram processes a 3 x 3 chunk of the input each time. 例如，我们上面的图表每次处理3 x 3的输入块。 Thus, it has a kernel_size of (3, 3) . 因此，它的kernel_size为(3, 3) 。 We can also call the above operation a "3x3 convolution" 我们也可以将上述操作称为“ 3x3卷积”

Larger kernel sizes are almost unconstrained in the features they represent, while smaller ones are restricted to specific low-level features. 较大的内核大小几乎不受它们表示的功能的约束，而较小的内核则限于特定的低级功能。 Note though that multiple layers of small kernel sizes can emulate the effect of a larger kernel size. 但是请注意，较小内核大小的多层可以模拟较大内核大小的效果。

Strides 大步前进

Notice how our above kernel shifts two units each time. 注意我们上面的kernel如何每次移动两个单位。 The amount that the kernel "shifts" for each computation is called strides , so in keras speak our strides=2 . 每次计算kernel “移动”的量称为strides ，因此在keras语中我们的strides=2 。 Generally speaking, as we increase the number of strides , our model loses more information from one layer to the next, due to the activation map having "gaps". 一般来说，随着strides增加，由于激活图具有“空白”，我们的模型会从一层到另一层丢失更多信息。

Padding 填充

Going back to the above diagram, notice the ring of white squares surrounding our input. 回到上图，注意输入周围的白色正方形环。 This is our padding . 这是我们的padding 。 Without padding, each time we pass our input through a Conv layer, the shape of our result gets smaller and smaller. 不使用填充，每次我们将输入传递给Conv图层时，结果的形状就会越来越小。 As a result we pad our input with a ring of zeros, which serves a few purposes: 结果，我们用零环pad了输入，这有几个目的：

Preserve information around the edges. 保留边缘信息。 From our diagram notice how each corner white square only goes through the convolution once, while center squares go through four times. 从我们的图表中注意到，每个角白色正方形仅经历一次卷积，而中心正方形经历了四次。 Adding padding alleviates this problem - squares originally on the edge get convolved more times. 添加填充可缓解此问题-最初位于边缘的正方形会卷积更多次。
padding is a way to control the shape of our output. padding是一种控制输出形状的方法。 We can make shapes easier to work with by keeping the output of each Conv layer have the same shape as our input, and we can make deeper models when our shape doesn't get smaller each time we use a Conv layer. 通过使每个Conv层的输出具有与输入相同的形状，可以使形状更易于使用，并且当每次使用Conv层时形状不会变Conv ，我们可以制作更深的模型。

keras provides three different types of padding. keras提供了三种不同类型的填充。 The explanations in the docs are very straightforward so they are copied / paraphrased here. 文档中的解释非常简单，因此可以在此处进行复制/解释。 These are passed in with padding=... , ie padding="valid" . 这些通过padding=...传递，即padding="valid" 。

valid : no padding valid ：无填充

same : padding the input such that the output has the same length as the original input same ：填充输入，以使输出具有与原始输入相同的长度

causal : results in causal (dialated convolutions). causal ：导致因果关系（径向卷积）。 Normally in the above diagram the "center" of the kernel maps to the value in the output activation map. 通常，在上图中，内核的“中心”映射到输出激活映射中的值。 With causal convolutions the right edge is used instead. 因果卷积使用右边缘代替。 This is useful for temporal data, where you don't want to use future data to model present data. 这对于临时数据非常有用，在临时数据中您不想使用将来的数据来对当前数据建模。

Conv1D, Conv2D, and Conv3D Conv1D，Conv2D和Conv3D

Intuitively the operations that occur on these layers remain the same. 直观地讲，在这些层上发生的操作保持不变。 Each kernel still slides across your input, each filter outputs an activation map for it's own feature, and the padding is still applied. 每个kernel仍会在您的输入中滑动，每个filter为其自身的功能输出激活图，并且仍然会使用padding 。

The difference is the number of dimensions that are convolved. 不同之处在于卷积的维数。 For example in Conv1D a 1D kernel slides across one axis. 例如，在Conv1D一维kernel沿一个轴滑动。 In Conv2D a 2D kernel slides across a two axes. 在Conv2D一个2D kernel在两个轴上滑动。

It is very important to note that the D in an XD Conv layer doesn't denote the number of dimensions of the input, but rather the number of axes that the kernel slides across . 请务必注意，XD Conv层中的D并不表示输入的维数，而是表示内核滑动的轴数 。

For example, in the above diagram, even though the input is 3D (image with RGB channels), this is an example of a Conv2D layer. 例如，在上图中，即使输入是3D（具有RGB通道的图像），这也是Conv2D图层的示例。 This is because there are two spatial dimensions - (rows, cols) , and the filter only slides along those two dimensions. 这是因为有两个空间维度- (rows, cols) ，并且过滤器仅沿这两个维度滑动。 You can consider this as being convolutional in the spatial dimensions and fully connected in the channels dimension. 您可以认为这在空间维度上是卷积的，在通道维度上是完全连接的。

The output for each filter is also two dimensional. 每个滤波器的输出也是二维的。 This is because each filter slides in two dimensions, creating a two dimensional output. 这是因为每个过滤器都在二维中滑动，从而创建了二维输出。 As a result you can also think of an ND Conv as each filter outputting an ND vector. 结果，您还可以将ND Conv视为每个输出ND向量的滤波器。

You can see the same thing with Conv1D (pictured above). 您可以使用Conv1D看到相同的Conv1D （如上图所示）。 While the input is two dimensional, the filter only slides along one axis, making this a 1D convolution. 当输入为二维时，过滤器仅沿一个轴滑动，从而使其成为一维卷积。

In keras , this means that ConvND will requires each sample to have N+1 dimensions - N dimensions for the filter to slide across and one additional channels dimension. 在keras ，这意味着ConvND将要求每个样本具有N+1维度N维度以使滤镜可以滑过，另外还有一个channels维度。

TLDR - Keras wrap up TLDR-Keras总结

filters : The amount of different kernels in the layer. filters ：层中不同kernels的数量。 Each kernel detects and outputs an activation map for a specific feature, making this the last value in the output shape. 每个kernel检测并输出特定功能的激活图，使其成为输出形状中的最后一个值。 Ie Conv1D outputs (batch, steps, filters) . 即Conv1D输出(batch, steps, filters) 。

kernel_size : Determines the dimensions of each kernel / filter / feature-detector. kernel_size ：确定每个kernel / filter /特征检测器的尺寸。 Also determines how much of the input is used to calculate each value in the output. 还确定要使用多少输入来计算输出中的每个值。 Larger size = detecting more complex features, less constraints; 更大的尺寸=检测更复杂的特征，更少的约束； however it's prone to overfitting. 但是它很容易过拟合。

strides : How many units you move to take the next convolution. strides ：您要移动多少单位才能进行下一个卷积。 Bigger strides = more information loss. 更大的strides =更多的信息损失。

padding : Either "valid" , "causal" , or "same" . padding ： "valid" ， "causal"或"same" 。 Determines if and how to pad the input with zeros. 确定是否以及如何用零填充输入。

1D vs 2D vs 3D : Denotes the number of axes that the kernel slides across. 1D vs 2D vs 3D ：表示内核滑动的轴数。 An ND Conv layer will output an ND output for each filter, but will require an N+1 dimensional input for each sample. ND Conv层将为每个滤镜输出ND输出，但将为每个样本要求N + 1维输入。 This is composed from N dimensions to side across plus one additional channels dimension. 它由N尺寸到两侧的宽度加上一个附加的channels尺寸组成。