简体   繁体   中英

Kernel size change in convolutional neural networks

I have been working on creating a convolutional neural.network from scratch, and am a little confused on how to treat kernel size for hidden convolutional layers. For example, say I have an MNIST image as input (28 x 28) and put it through the following layers.

Convolutional layer with kernel_size = (5,5) with 32 output channels

  • new dimension of throughput = (32, 28, 28)

Max Pooling layer with pool_size (2,2) and step (2,2)

  • new dimension of throughput = (32, 14, 14)

If I now want to create a second convolutional layer with kernel size = (5x5) and 64 output channels, how do I proceed? Does this mean that I only need two new filters (2 x 32 existing channels) or does the kernel size change to be (32 x 5 x 5) since there are already 32 input channels?

Since the initial input was a 2D image, I do not know how to conduct convolution for the hidden layer since the input is now 3 dimensional (32 x 14 x 14) .

you need 64 kernel, each with the size of (32,5,5) .

depth(#channels) of kernels, 32 in this case, or 3 for a RGB image, 1 for gray scale etc, should always match the input depth, but values are all the same. eg if you have a 3x3 kernel like this : [-1 0 1; -2 0 2; -1 0 1] and now you want to convolve it with an input with N as depth or say channel, you just copy this 3x3 kernel N times in 3rd dimension, the following math is just like the 1 channel case, you sum all values in all N channels which your kernel window is currently on them after multiplying the kernel values with them and get the value of just 1 entry or pixel. so what you get as output in the end is a matrix with 1 channel:) how much depth you want your matrix for next layer to have? that's the number of kernels you should apply. hence in your case it would be a kernel with this size (64 x 32 x 5 x 5) which is actually 64 kernels with 32 channels for each and same 5x5 values in all cahnnels.

("I am not a very confident english speaker hope you get what I said, it would be nice if someone edit this :)")

You essentially answered your own question. YOU are building the.network solver. It seems like your convolutional layer output is [channels out] = [channels in] * [number of kernels]. I had to infer this from the wording of your question. In general, this is how it works: you specify the kernel size of the layer and how many kernels to use. Since you have one input channel you are essentially saying that there are 32 kernels in your first convolution layer. That is 32 unique 5x5 kernels. Each of these kernels will be applied to the one input channel. More in general, each of the layer kernels (32 in your example) is applied to each of the input channels. And that is the key. If you build code to implement the convolution layer according to these generalities, then your subsequent convolution layers are done. In the next layer you specify two kernels per channel. In your example there would be 32 input channels, the hidden layer has 2 kernels per channel, and the output would be 64 channels. You could then down sample by applying a pooling layer, then flatten the 64 channels [turn a matrix into a vector by stacking the columns or rows], and pass it as a column vector into a fully connected.network. That is the basic scheme of convolutional.networks. The work comes when you try to code up backpropagation through the convolutional layers. But the OP didn't ask about that. I'll just say this, you will come to a place where you have the stored input matrix (one channel), you have a gradient from a lower layer in the form of a matrix and is the size of the layer kernel, and you need to backpropagate it up to the next convolutional layer. The simple approach is to rotate your stored channel matrix by 180 degrees and then convolve it with the gradient. The explanation for this is long and tedious, too much to write here, and not a lot on the inte.net explains it well. A more sophisticated approach is to apply “correlation” between the input gradient and the stored channel matrix. Note I specifically said “correlation” as opposed to “convolution” and that is key. If you think they “almost” the same thing, then I recommend you take some time and learn about the differences.

If you would like to have a look at my CNN solver here's a link to the project. It's C++ and no documentation, sorry:) It's all in a header file called layer.h, find the class FilterLayer2D. I think the code is pretty readable (what programmer doesn't think his code is readable:) ) https://github.com/sraber/simpl.net.git

I also wrote a paper on basic fully connected.networks. I wrote it so that I would forget what I learned in my self study. Maybe you'll get something out of it. It's at this link: http://www.raberfamily.com/scottblog/scottblog.htm

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM