简体   繁体   中英

Pooling Layer vs. Using Padding in Convolutional Layers

My understanding is that we use padding when we convolute because convoluting with filters reduces the dimension of the output by shrinking it, as well as loses information from the edges/corners of the input matrix. However, we also use a pooling layer after a number of Conv layers in order to downsample our feature maps. Doesn't this seem sort of contradicting? We use padding because we do NOT want to reduce the spatial dimensions but we later use pooling to reduce the spatial dimensions. Could someone provide some intuition behind these 2?

Without loss of generality, assume we are dealing with images as inputs. The reasons behind padding is not only to keep the dimensions from shrinking, it's also to ensure that input pixels on the corners and edges of the input are not "disadvantaged" in affecting the output. Without padding, a pixel on the corner of an images overlaps with just one filter region, while a pixel in the middle of the image overlaps with many filter regions. Hence, the pixel in the middle affects more units in the next layer and therefore has a greater impact on the output. Secondly, you actually do want to shrink dimensions of your input (Remember, Deep Learning is all about compression, ie to find low dimensional representations of the input that disentangle the factors of variation in your data). The shrinking induced by convolutions with no padding is not ideal and if you have a really deep net you would quickly end up with very low dimensional representations that lose most of the relevant information in the data. Instead you want to shrink your dimensions in a smart way, which is achieved by Pooling. In particular, Max Pooling has been found to work well. This is really an empirical result, ie there isn't a lot of theory to explain why this is the case. You could imagine that by taking the max over nearby activations, you still retain the information about the presence of a particular feature in this region, while losing information about its exact location. This can be good or bad. Good because it buys you translation invariance, and bad because exact location may be relevant for you problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM