简体   繁体   English


[英]CNN Architecture with TensorFlow

I am implementing the architecture found within this paper https://cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf using TensorFlow. 我正在使用TensorFlow实现本文https://cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf中找到的架构。


I have formatted my input to be 224, 224, 3 and have the following TensorFlow layers. 我将输入的格式设置为224, 224, 3并具有以下TensorFlow层。 The issue I am getting is that the output size of conv1 is not 110 by 110 by 96 as the paper states, rather 109 by 109 by 96. How can I fix this? 我遇到的问题是, conv1的输出大小不是如论文所述的110 x 110 x 96,而是109 x 109 conv1如何解决此问题?

I have followed the hyperparemeters specified in the paper on page 8. My only idea is that the padding may be incorrect (as TensorFlow sets it for you). 我遵循了第8页上的论文中指定的超高仪。我唯一的想法是填充可能不正确(因为TensorFlow为您设置了它)。

My code is as below: 我的代码如下:

# Input Layer
# Reshape X to 4-D tensor: [batch_size, width, height, channels]
input_layer = tf.reshape(features["x"], [-1, IMG_SIZE, IMG_SIZE, 3])

print(input_layer.get_shape(), '224, 224, 3')
# Convolutional Layer #1
# Input Tensor Shape: [batch_size, 224, 224, 3]
conv1 = tf.layers.conv2d(
    kernel_size=[7, 7],
    padding="valid",  # padding = 1
print(conv1.get_shape(), '110, 110, 96')

# Max Pooling Layer
# Input Tensor Shape: [batch_size, 110, 110, 96]
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[3, 3], strides=2)

# Contrast Normalisation
# Input Tensor Shape: [batch_size, 55, 55, 96]
contrast_norm1 = tf.nn.local_response_normalization(
print(contrast_norm1.get_shape(), '55, 55, 96')

# The rest of the CNN...

Output: In-brackets - actual dimensions, outside - desired/paper dimensions 输出:支架内-实际尺寸,外部-所需/纸张尺寸

(?, 224, 224, 3) 224, 224, 3  # Input
(?, 109, 109, 96) 110, 110, 96  # Following conv1
(?, 54, 54, 96) 55, 55, 96  # Following contrast_norm1

The output height and width dimensions of convolution operation with valid padding can be calculated as: 具有valid填充的卷积运算的输出高度和宽度尺寸可以计算为:

output_size = (input_size - kernel_size) // stride + 1

In your case, the outupt of first layer: 在您的情况下,第一层的输出是:

output_size = (224 - 7) // 2 + 1 = 217 // 2 + 1 = 109

One way to make the output of first layer to be equal to 110 is to set the kernel size to 6x6 . 使第一层的输出等于110的一种方法是将内核大小设置为6x6 The other way could be adding padding of size 1 using tf.pad : 另一种方法是使用tf.pad添加大小为1的填充:

# suppose this is a batch of 10 images of size 4x4x3
data = np.ones((10, 4, 4, 3), dtype=np.float32)

paddings = [[0, 0], # no values are added along batch dim
            [1, 0], # add one value before the content of height dim
            [1, 0], # add one value before the content of width dim
            [0, 0]] # no values are added along channel dim

padded_data = tf.pad(tensor=data,

sess = tf. InteractiveSession()
output = sess.run(padded_data)
# >>> (10, 5, 5, 3)

# print content of first channel of first image
# >>> [[0. 0. 0. 0. 0.]
#      [0. 1. 1. 1. 1.]
#      [0. 1. 1. 1. 1.]
#      [0. 1. 1. 1. 1.]
#      [0. 1. 1. 1. 1.]]

In the example above, zero-padding of size 1 is added along height and width dimensions. 在上面的示例中,沿高度和宽度尺寸添加了大小为1的零填充。 The padding should be of shape [number_of_dimensions, 2] , eg for each dimension of the input matrix you specify how many values to add before and after the content of the tensor. 填充的形状应为[number_of_dimensions, 2]的形状,例如,对于输入矩阵的每个维度,您可以指定在张量的内容之前和之后添加多少个值。

If you apply this padding to your input data it will result in a tensor of shape batch x 225 x 225 x 3 , thus the output height and width of the convolutional layer will be 110x110 . 如果将此填充应用于输入数据,则将生成张量为batch x 225 x 225 x 3的张量,因此卷积层的输出高度和宽度将为110x110

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM