简体   繁体   中英

Why are the MNIST images 1x28x28 tensors?

I made the MNIST images which are 28x28 pixel images into tensors with

dataset = MNIST(root='data/', train=True, transform=transforms.ToTensor())

and when I run

img_tensor, label = dataset[0]
print(img_tensor.shape, label)

It says the shape is torch.Size([1, 28, 28]) . Why is it a 1x28x28? What does the first dimension mean? and what is the point of a 1x28x28 opposed to 28x28?

An image seen as a matrix has always 3 dimensions: channels, width and height. 28 and 28 are width and height of course. The 1 in this case is the channel. So what's the channel? Every pixel is represented by three colors: red, blue and green. For each color, you will have one color-channel, so normally 3 (RGB). This makes a pictures dimension (3, W, H). So why do you have a 1 there? Because the MNIST images are black and white and therefore dont need three different color-channel to represent the final color, one channel is enough, therefore for black and white images you dimension is (1, W, H). Here is a picture below to visualize the dimensions: 在此处输入图像描述

source: https://commons.wikimedia.org/wiki/File:RGB_channels_separation.png

So you see, for black and white images you only need one channel. Normally you could ignore the 1 dimension, but pytorch demands the channel dimension.

The order is (B, C, W, H) -> (batch, channel, width and height) is which pytorch convolutions operate.

The first dimension tracks color channels. The second and third dimensions represent pixels along the height and width of the image, respectively. Since images in the MNIST dataset are grayscale, there's just one channel. Other datasets have images with color, in which case there are three channels: red, green, and blue (RGB).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM