CNN Architecture with TensorFlow

Question

I am implementing the architecture found within this paper https://cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf using TensorFlow.

I have formatted my input to be 224, 224, 3 and have the following TensorFlow layers. The issue I am getting is that the output size of conv1 is not 110 by 110 by 96 as the paper states, rather 109 by 109 by 96. How can I fix this?

I have followed the hyperparemeters specified in the paper on page 8. My only idea is that the padding may be incorrect (as TensorFlow sets it for you).

My code is as below:

# Input Layer
# Reshape X to 4-D tensor: [batch_size, width, height, channels]
input_layer = tf.reshape(features["x"], [-1, IMG_SIZE, IMG_SIZE, 3])

print(input_layer.get_shape(), '224, 224, 3')
# Convolutional Layer #1
# Input Tensor Shape: [batch_size, 224, 224, 3]
conv1 = tf.layers.conv2d(
    inputs=input_layer,
    filters=96,
    kernel_size=[7, 7],
    strides=2,
    padding="valid",  # padding = 1
    activation=tf.nn.relu)
print(conv1.get_shape(), '110, 110, 96')

# Max Pooling Layer
# Input Tensor Shape: [batch_size, 110, 110, 96]
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[3, 3], strides=2)

# Contrast Normalisation
# Input Tensor Shape: [batch_size, 55, 55, 96]
contrast_norm1 = tf.nn.local_response_normalization(
    pool1,
    depth_radius=5,
    bias=1,
    alpha=0.0001,
    beta=0.75)
print(contrast_norm1.get_shape(), '55, 55, 96')

# The rest of the CNN...

Output: In-brackets - actual dimensions, outside - desired/paper dimensions

(?, 224, 224, 3) 224, 224, 3  # Input
(?, 109, 109, 96) 110, 110, 96  # Following conv1
(?, 54, 54, 96) 55, 55, 96  # Following contrast_norm1

Answer 1

The output height and width dimensions of convolution operation with valid padding can be calculated as:

output_size = (input_size - kernel_size) // stride + 1

In your case, the outupt of first layer:

output_size = (224 - 7) // 2 + 1 = 217 // 2 + 1 = 109

One way to make the output of first layer to be equal to 110 is to set the kernel size to 6x6 . The other way could be adding padding of size 1 using tf.pad :

# suppose this is a batch of 10 images of size 4x4x3
data = np.ones((10, 4, 4, 3), dtype=np.float32)

paddings = [[0, 0], # no values are added along batch dim
            [1, 0], # add one value before the content of height dim
            [1, 0], # add one value before the content of width dim
            [0, 0]] # no values are added along channel dim

padded_data = tf.pad(tensor=data,
                     paddings=paddings,
                     mode='CONSTANT',
                     constant_values=0)

sess = tf. InteractiveSession()
output = sess.run(padded_data)
print(output.shape)
# >>> (10, 5, 5, 3)

# print content of first channel of first image
print(output[0,:,:,0])
# >>> [[0. 0. 0. 0. 0.]
#      [0. 1. 1. 1. 1.]
#      [0. 1. 1. 1. 1.]
#      [0. 1. 1. 1. 1.]
#      [0. 1. 1. 1. 1.]]

In the example above, zero-padding of size 1 is added along height and width dimensions. The padding should be of shape [number_of_dimensions, 2] , eg for each dimension of the input matrix you specify how many values to add before and after the content of the tensor.

If you apply this padding to your input data it will result in a tensor of shape batch x 225 x 225 x 3 , thus the output height and width of the convolutional layer will be 110x110 .

CNN Architecture with TensorFlow

Question

1 answers

solution1
3 ACCPTED 2018-07-03 10:46:46

CNN Architecture with TensorFlow

Question

1 answers

solution1 3 ACCPTED 2018-07-03 10:46:46

solution1
3 ACCPTED 2018-07-03 10:46:46