简体繁体 English

Kernel 卷积神经网络中的大小变化

[英]Kernel size change in convolutional neural networks

原文 2018-10-25 20:43:10 1 2 python/ tensorflow/ neural-network/ conv-neural-network/ convolution

I have been working on creating a convolutional neural.network from scratch, and am a little confused on how to treat kernel size for hidden convolutional layers.我一直致力于从头开始创建卷积神经网络，并且对如何处理隐藏卷积层的 kernel 大小感到有点困惑。 For example, say I have an MNIST image as input (28 x 28) and put it through the following layers.例如，假设我有一个MNIST图像作为input (28 x 28)并将其放入以下层。

Convolutional layer with kernel_size = (5,5) with 32 output channels kernel_size = (5,5) 的卷积层有 32 个 output 通道

new dimension of throughput = (32, 28, 28)吞吐量的新维度 = (32, 28, 28)

Max Pooling layer with pool_size (2,2) and step (2,2)具有 pool_size (2,2) 和步长 (2,2) 的最大池化层

new dimension of throughput = (32, 14, 14)吞吐量的新维度 = (32, 14, 14)

If I now want to create a second convolutional layer with kernel size = (5x5) and 64 output channels, how do I proceed?如果我现在想创建第二个卷积层 kernel size = (5x5) 和 64 output 通道，我该如何进行？ Does this mean that I only need two new filters (2 x 32 existing channels) or does the kernel size change to be (32 x 5 x 5) since there are already 32 input channels?这是否意味着我只需要两个新过滤器(2 x 32 existing channels)或者 kernel 大小是否更改为(32 x 5 x 5)因为已经有 32 个输入通道？

Since the initial input was a 2D image, I do not know how to conduct convolution for the hidden layer since the input is now 3 dimensional (32 x 14 x 14) .由于初始输入是 2D 图像，我不知道如何对隐藏层进行卷积，因为输入现在是 3 维(32 x 14 x 14) 。

2 个解决方案

you need 64 kernel, each with the size of (32,5,5) . 您需要64个内核，每个内核的大小为（32,5,5）。

depth(#channels) of kernels, 32 in this case, or 3 for a RGB image, 1 for gray scale etc, should always match the input depth, but values are all the same. 内核的depth（#channels），在这种情况下为32，对于RGB图像为3，对于灰度等为1，应始终与输入深度匹配，但值均相同。 eg if you have a 3x3 kernel like this : [-1 0 1; 例如，如果您具有这样的3x3内核：[-1 0 1; -2 0 2; -2 0 2; -1 0 1] and now you want to convolve it with an input with N as depth or say channel, you just copy this 3x3 kernel N times in 3rd dimension, the following math is just like the 1 channel case, you sum all values in all N channels which your kernel window is currently on them after multiplying the kernel values with them and get the value of just 1 entry or pixel. -1 0 1]，现在您想将其与深度为N的输入或说通道进行卷积，只需将3x3内核在第3维上复制N次，以下数学运算就像1通道的情况一样，将所有值求和在将内核值乘以它们之后，仅获得1个条目或像素的值后，内核窗口当前在它们上的所有N个通道中。 so what you get as output in the end is a matrix with 1 channel:) how much depth you want your matrix for next layer to have? 因此，最终得到的输出是一个具有1通道的矩阵：）您希望下一层的矩阵有多少深度？ that's the number of kernels you should apply. 那就是您应该应用的内核数。 hence in your case it would be a kernel with this size (64 x 32 x 5 x 5) which is actually 64 kernels with 32 channels for each and same 5x5 values in all cahnnels. 因此，在您的情况下，它将是一个具有此大小（64 x 32 x 5 x 5）的内核，实际上是64个内核，每个通道具有32个通道，并且所有通道中的5x5值相同。

("I am not a very confident english speaker hope you get what I said, it would be nice if someone edit this :)") （“我说英语的人不是很自信，希望您能理解我的意思，如果有人编辑它，那会很好:)”）

You essentially answered your own question.你基本上回答了你自己的问题。 YOU are building the.network solver.您正在构建 .network 求解器。 It seems like your convolutional layer output is [channels out] = [channels in] * [number of kernels].看起来你的卷积层 output 是 [channels out] = [channels in] * [number of kernels]。 I had to infer this from the wording of your question.我不得不从你问题的措辞中推断出这一点。 In general, this is how it works: you specify the kernel size of the layer and how many kernels to use.通常，它是这样工作的：您指定层的大小 kernel 以及要使用的内核数。 Since you have one input channel you are essentially saying that there are 32 kernels in your first convolution layer.由于您有一个输入通道，您实际上是在说第一个卷积层中有 32 个内核。 That is 32 unique 5x5 kernels.那是 32 个独特的 5x5 内核。 Each of these kernels will be applied to the one input channel.这些内核中的每一个都将应用于一个输入通道。 More in general, each of the layer kernels (32 in your example) is applied to each of the input channels.更一般地说，每个层内核（在您的示例中为 32 个）都应用于每个输入通道。 And that is the key.这就是关键。 If you build code to implement the convolution layer according to these generalities, then your subsequent convolution layers are done.如果您根据这些通用性构建代码来实现卷积层，那么您的后续卷积层就完成了。 In the next layer you specify two kernels per channel.在下一层中，您为每个通道指定两个内核。 In your example there would be 32 input channels, the hidden layer has 2 kernels per channel, and the output would be 64 channels.在您的示例中，将有 32 个输入通道，隐藏层每个通道有 2 个内核，而 output 将是 64 个通道。 You could then down sample by applying a pooling layer, then flatten the 64 channels [turn a matrix into a vector by stacking the columns or rows], and pass it as a column vector into a fully connected.network.然后，您可以通过应用池化层来向下采样，然后展平 64 个通道 [通过堆叠列或行将矩阵转换为向量]，并将其作为列向量传递到完全连接的网络中。 That is the basic scheme of convolutional.networks.这就是卷积网络的基本方案。 The work comes when you try to code up backpropagation through the convolutional layers.当您尝试编写通过卷积层的反向传播时，工作就来了。 But the OP didn't ask about that.但是OP没有问这个。 I'll just say this, you will come to a place where you have the stored input matrix (one channel), you have a gradient from a lower layer in the form of a matrix and is the size of the layer kernel, and you need to backpropagate it up to the next convolutional layer.我就这么说吧，你会来到一个地方，你有存储的输入矩阵（一个通道），你有一个来自较低层的矩阵形式的梯度，是层的大小 kernel，你需要将它反向传播到下一个卷积层。 The simple approach is to rotate your stored channel matrix by 180 degrees and then convolve it with the gradient.简单的方法是将您存储的通道矩阵旋转 180 度，然后将其与梯度进行卷积。 The explanation for this is long and tedious, too much to write here, and not a lot on the inte.net explains it well.对此的解释又长又乏味，这里写的太多了，而且在 inte.net 上解释得很好。 A more sophisticated approach is to apply “correlation” between the input gradient and the stored channel matrix.一种更复杂的方法是在输入梯度和存储的通道矩阵之间应用“相关性”。 Note I specifically said “correlation” as opposed to “convolution” and that is key.请注意，我特别提到了“相关性”而不是“卷积”，这是关键。 If you think they “almost” the same thing, then I recommend you take some time and learn about the differences.如果您认为它们“几乎”是同一件事，那么我建议您花些时间了解它们之间的差异。

If you would like to have a look at my CNN solver here's a link to the project.如果你想看看我的 CNN 求解器，这里有一个项目链接。 It's C++ and no documentation, sorry:) It's all in a header file called layer.h, find the class FilterLayer2D.它是 C++ 并且没有文档，抱歉:)它都在一个名为 layer.h 的 header 文件中，找到 class FilterLayer2D。 I think the code is pretty readable (what programmer doesn't think his code is readable:) ) https://github.com/sraber/simpl.net.git我认为代码非常可读（程序员不认为他的代码可读:)） https://github.com/sraber/simpl.net.git

I also wrote a paper on basic fully connected.networks.我还写了一篇关于 basic fully connected.networks 的论文。 I wrote it so that I would forget what I learned in my self study.我写它是为了忘记我在自学中学到的东西。 Maybe you'll get something out of it.也许你会从中得到一些东西。 It's at this link: http://www.raberfamily.com/scottblog/scottblog.htm它位于此链接： http://www.raberfamily.com/scottblog/scottblog.htm