简体   繁体   English

如何理解卷积层 output 形状

[英]how to understand the convolutional layer output shape

I am a bit confused about the output shape of the convolutional layer.我对卷积层的 output 形状有点困惑。 For example, as the image showed, 2 filters for 6 6 3 images, finally output will be 4 4 2, which the three color channel will fuse to 1 layer, but in some network after the convolution layer, the color channel still keep, for example here model.add(Conv2D(32, kernel_size=5,strides=1, activation=None, input_shape=(128,128,3))), the output shape of this is conv2d_5 (5, 5, 3, 32), and the question is I didn't see any specific code to say color channel keep or not.例如,如图所示,6 6 3 图像使用 2 个过滤器,最后 output 将是 4 4 2,这三个颜色通道将融合为 1 层,但在卷积层之后的某些网络中,颜色通道仍然保持,例如这里 model.add(Conv2D(32, kernel_size=5,strides=1, activation=None, input_shape=(128,128,3))),output shape 是 conv2d_5,5,33问题是我没有看到任何特定的代码来说明颜色通道是否保留。 enter image description here在此处输入图像描述

In the example image posted by OP for input of size 6 x 6 x 3 ( input_dim=6, channel_in=3 ) with 2 filters of size 3 x 3 ( filter_size=3 ) the spatial dimension can be computed as (input_dim - filter_size + 2 * padding) / stride + 1 = (6 - 3 + 2 * 0)/1 + 1 = 4 (where padding=0 and stride=1 )在 OP 发布的示例图像中,输入大小为6 x 6 x 3 ( input_dim=6, channel_in=3 ),带有2个大小为3 x 3 ( filter_size=3 ) 的滤波器,空间维度可以计算为(input_dim - filter_size + 2 * padding) / stride + 1 = (6 - 3 + 2 * 0)/1 + 1 = 4 (其中padding=0stride=1

Thus the 4 x 4 feature map.因此4 x 4功能 map。 The operation used in standard CNN layer for computing the element in this feature map is that of fully-connected layer.标准 CNN 层中用于计算此特征 map 中的元素的操作是全连接层的操作。 Consider an example filter and image patch below (from CS231n ):考虑下面的示例过滤器和图像补丁(来自CS231n ):

在此处输入图像描述

then the output element is computed as:然后 output 元素计算为:

import numpy as np

# filter weights of size 3 x 3 x 3
w0 = np.array([
    [[0., -1., 0.],
     [1., -1., 0.],
     [0., -1., 0.]],
    [[0., 1., -1.],
     [-1., 1., 0.],
     [1., -1., 0.]],
    [[-1., 0., 0.],
     [0., -1., -1.],
     [1., -1., 0.]]
])
# bias value for the filter
b0 = 1

# an input image patch 3 x 3 x 3
x_patch = np.array([
    [[0., 0., 0.],
     [0., 2., 1.],
     [0., 1., 1.]],
    [[0., 0., 0.],
     [0., 0., 1.],
     [0., 0., 1.]],
    [[0., 0., 0.],
     [0., 0., 0.],
     [0., 0., 2.]]
])

# define the operation for each channel
>>> op = lambda xs, ws: np.sum(xs*ws)
>>> op(x_patch[:, :, 0], w0[:, :, 0]) # channel 1
0.0
>>> op(x_patch[:, :, 1], w0[:, :, 1]) # channel 2
-3.0
>>> op(x_patch[:, :, 2], w0[:, :, 2]) # channel 3
0.0

# add the values for each channel (this is where 
# channel dimension is summed over) plus the bias
>>> 0.0 + (-3.0) + 0.0 + b0
-2.0

# or simply
>>> np.sum(x_patch * w0) + b0
-2.0

This is generally the case for CNN, which can alterantively be visualized as这通常是 CNN 的情况,也可以将其可视化为

compared to Depth-wise convolution where the channel dimension is kept as is:与通道维度保持原样的深度卷积相比:

TensorFlow provides separate implementations for each in tf.keras.layers.Conv2D ( here ) and tf.keras.layers.DepthwiseConv2D ( here ) so you can use according to your application. TensorFlow 为tf.keras.layers.Conv2D此处)和tf.keras.layers.DepthwiseConv2D此处)中的每个提供单独的实现,因此您可以根据您的应用程序在此处使用。

I cannot reproduce the output dimension of 5 x 5 x 3 x 32 for your second example (using tf v2.9.0):对于您的第二个示例(使用 tf v2.9.0),我无法重现5 x 5 x 3 x 32的 output 尺寸:

import tensorflow as tf

# The inputs are 128 x 128 RGB images with 
# `data_format=channels_last` (by default) and 
# the batch size is 4.
>>> input_shape = (4, 128, 128, 3)
>>> x = tf.random.normal(input_shape)
>>> y = tf.keras.layers.Conv2D(
 32, 
 kernel_size=5, 
 strides=1, 
 activation=None, 
 input_shape=(128, 128, 3)
)(x)
>>> print(y.shape)
(4, 124, 124, 32)

The example code is slightly adjusted from the official documentation example .示例代码根据官方文档示例稍作调整。

model = Sequential()

model.add(Conv2D(32, kernel_size=5,strides=1, activation=None, input_shape=(128,128,3)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPool2D(2,2))
model.add(Dropout(0.2))

model.add(Flatten())
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.5))

model.add(Dense(units=1))
model.add(Activation('sigmoid'))


for layer in model.layers:
    # check for convolutional layer
    if 'conv' not in layer.name:
        continue
    # get filter weights
    filters, biases = layer.get_weights()
    print(layer.name, filters.shape)

so when I print the conv output layer shape it shows as conv2d_46 (5, 5, 3, 32).所以当我打印 conv output 图层形状时,它显示为 conv2d_46 (5, 5, 3, 32)。 when I print the summary the out shape show different, what is None?当我打印摘要时,外形显示不同,什么是无? Layer (type) Output Shape Param #层(类型)Output 形状参数 #

conv2d_45 (Conv2D) (None, 124, 124, 32) 2432 conv2d_45 (Conv2D) (无, 124, 124, 32) 2432

batch_normalization_38 (Bat (None, 124, 124, 32) 128 batch_normalization_38(蝙蝠(无,124、124、32)128
chNormalization)归一化)

activation_36 (Activation) (None, 124, 124, 32) 0 activation_36(激活)(无、124、124、32)0

max_pooling2d_17 (MaxPoolin (None, 62, 62, 32) 0 max_pooling2d_17 (MaxPoolin (无, 62, 62, 32) 0
g2D) g2D)

dropout_26 (Dropout) (None, 62, 62, 32) 0 dropout_26 (辍学) (无, 62, 62, 32) 0

flatten_11 (Flatten) (None, 123008) 0 flatten_11(展平)(无,123008)0

dense_23 (Dense) (None, 64) 7872576 dense_23(密集)(无,64)7872576

dropout_27 (Dropout) (None, 64) 0 dropout_27(辍学)(无,64)0

dense_24 (Dense) (None, 1) 65 dense_24(密集)(无,1)65

activation_37 (Activation) (None, 1) 0 activation_37(激活)(无,1)0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM