What is the logic that is being used to convert 28X28 input image to 32X32 image in LeNet?

Question

I am using MNIST data which has an image of size 28X28 pixel I using padding to convert it to 32X32 pixels as shown below:

tf.pad(tensor=X_train, paddings=[[0, 0], [2,2], [2,2]])

Output is coming out to be correct.

TensorShape([60000, 32, 32])

I want to understand what exactly does [0, 0], [2, 2] and [2, 2] mean? What is the top, bottom, left, and right padding here? What do numbers depict?

Answer 1

From https://www.tensorflow.org/api_docs/python/tf/pad :

This operation pads a tensor according to the paddings you specify. paddings is an integer tensor with shape [n, 2], where n is the rank of tensor. For each dimension D of input, paddings[D, 0] indicates how many values to add before the contents of tensor in that dimension, and paddings[D, 1] indicates how many values to add after the contents of tensor in that dimension.

Here you have a rank 3 tensor. The dimension 0 is the batch dimension along which you have 28 x 28 tensors. Dimensions 1 and 2 correspond to the height and width of the input tensor. In those dimensions, you add 2 elements before and after the original row/column, which makes the output shape = 28 + 2 + 2 = 32.

For example, the top and bottom padding is specified by paddings[1] which pads the 28 x 28 tensor with 2 zeros at the top and 2 zeros at the bottom. Similarly, paddings[2] provides the left and right padding amounts.

Look at this example for a clearer understanding:

>>> import tensorflow as tf
# create a random tensor of shape 2 x 2 x 2
X = tf.random.uniform(shape=[2, 2, 2])
>>> X
<tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[0.60002756, 0.5554304 ],
        [0.15563118, 0.75253165]],

       [[0.983318  , 0.4908601 ],
        [0.16791439, 0.55565095]]], dtype=float32)>

# pad along batch dimension
>>> tf.pad(tensor = X, paddings = [[1, 1], [0, 0], [0, 0]])
<tf.Tensor: shape=(4, 2, 2), dtype=float32, numpy=
array([[[0.        , 0.        ],
        [0.        , 0.        ]],

       [[0.60002756, 0.5554304 ],
        [0.15563118, 0.75253165]],

       [[0.983318  , 0.4908601 ],
        [0.16791439, 0.55565095]],

       [[0.        , 0.        ],
        [0.        , 0.        ]]], dtype=float32)>


# pad along height/rows
>>> tf.pad(tensor = X, paddings = [[0, 0], [1, 1], [0, 0]])
<tf.Tensor: shape=(2, 4, 2), dtype=float32, numpy=
array([[[0.        , 0.        ],
        [0.60002756, 0.5554304 ],
        [0.15563118, 0.75253165],
        [0.        , 0.        ]],

       [[0.        , 0.        ],
        [0.983318  , 0.4908601 ],
        [0.16791439, 0.55565095],
        [0.        , 0.        ]]], dtype=float32)>


# pad along width/columns
>>> tf.pad(tensor = X, paddings = [[0, 0], [0, 0], [1, 1]])
<tf.Tensor: shape=(2, 2, 4), dtype=float32, numpy=
array([[[0.        , 0.60002756, 0.5554304 , 0.        ],
        [0.        , 0.15563118, 0.75253165, 0.        ]],

       [[0.        , 0.983318  , 0.4908601 , 0.        ],
        [0.        , 0.16791439, 0.55565095, 0.        ]]], dtype=float32)>

Note how the tensor shapes change above after each kind of padding operation.

Since in your case you do not want redundant zeroed samples along the batch, you have [0, 0] along the batch dimension.

What is the logic that is being used to convert 28X28 input image to 32X32 image in LeNet?

Question

1 answers

solution1
1 2021-05-13 09:48:09

What is the logic that is being used to convert 28X28 input image to 32X32 image in LeNet?

Question

1 answers

solution1 1 2021-05-13 09:48:09

solution1
1 2021-05-13 09:48:09