Question1.
Let us say the image have a shape of (batchsize=100,height=28,width=28,channel=1)
and if we put this image in the model CNN underneath,
class CNN(torch.nn.Module):
def __init__(self):
super().__init__()
# ImgIn shape=(100, 28, 28, 1)
# Image shape after Conv -> (100, 28, 28, 32)
# Image shape after Pool -> (100, 14, 14, 32)
self.layer1 = torch.nn.Sequential(
torch.nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2))
# second layer
# ImgIn shape=(100, 14, 14, 32)
# Image shape after Conv ->(100, 14, 14, 64)
# Image shape after Pool ->(100, 7, 7, 64)
self.layer2 = torch.nn.Sequential(
torch.nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2))
# THESE PART CONFUSE!!!
self.fc = torch.nn.Linear(7 * 7 * 64, 10, bias=True)
torch.nn.init.xavier_uniform_(self.fc.weight)
def forward(self, x):
out = self.layer1(x)
out = self.layer2(out)
out = out.view(out.size(0), -1)
out = self.fc(out)
return out
WHAT HAPPENS to the batchsize of the image after self.fc.
Is it size(batch_size=100,10)?
Qustion 2.
Also, I am confused of mini-batch. If there is 5 mini batch, and the loss is [5,-5,4,-4,0], then the average loss will be zero, then will the neural network stop training even if there is loss in each mini-batch?
Qustion 3. Can neural network be expressed my complex matrix multiplication?
Q1
self.fc
is just a single linear layer. The key line here is out.view(out.size(0), -1)
which is nothing but flatten (reshape in NumPy), where out.size(0)
is your batch size and -1
is all elements of the tensor (See torch.view ).
In other words, this line transforms your 4d (Batch, C, H, W)
Conv layer into 2d (Batch, 7 * 7 * 64)
. Finally, your fc
layer outputs (Batch, 10)
Q2
This question is confusing. How did you get this loss? It seems, that there is a missing part of your code.
Q3
A linear multilayer perceptron is actually a matrix multiplication. We can apply nonlinear activation function to the output of each layer (which is a result of matrix multiplication).
CNN can also be seen as a kind of matrix operation. This type of operation is called convolution
So, since you have a complex network, you cannot express it as a single matrix multiplication.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.