简体   繁体   English

pytorch:无法理解model.forward函数

[英]pytorch : unable to understand model.forward function

I am learning deep learning and am trying to understand the pytorch code given below.我正在学习深度学习,并试图理解下面给出的 pytorch 代码。 I'm struggling to understand how the probability calculation works.我正在努力理解概率计算是如何工作的。 Can somehow break it down in lay-man terms.可以以某种方式将其分解为外行人的术语。 Thanks a ton.万分感谢。

ps = model.forward(images[0,:]) ps = model.forward(图像[0,:])

# Hyperparameters for our network
input_size = 784
hidden_sizes = [128, 64]
output_size = 10

# Build a feed-forward network
model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
                      nn.ReLU(),
                      nn.Linear(hidden_sizes[0], hidden_sizes[1]),
                      nn.ReLU(),
                      nn.Linear(hidden_sizes[1], output_size),
                      nn.Softmax(dim=1))
print(model)

# Forward pass through the network and display output
images, labels = next(iter(trainloader))
images.resize_(images.shape[0], 1, 784)
print(images.shape)
ps = model.forward(images[0,:])

I'm a layman so I'll help you with the layman's terms :)我是外行,所以我会用外行的条款来帮助你:)

input_size = 784
hidden_sizes = [128, 64]
output_size = 10

These are parameters for the layers in your network.这些是网络层的参数。 Each neural network consists of layers , and each layer has an input and an output shape.每个神经网络由layers ,每layer都有一个输入和一个输出形状。

Specifically input_size deals with the input shape of the first layer.具体来说input_size处理第一层的输入形状。 This is the input_size of the entire network.这是整个网络的input_size Each sample that is input into the network will be a 1 dimension vector that is length 784 (array that is 784 long).输入到网络中的每个样本将是一个长度为 784 的一维向量(长度为 784 的数组)。

hidden_size deals with the shapes inside the network. hidden_size处理网络内部的形状。 We will cover this a little later.稍后我们将介绍这一点。

output_size deals with the output shape of the last layer. output_size处理最后一层的输出形状。 This means that our network will output a 1 dimensional vector that is length 10 for each sample.这意味着我们的网络将为每个样本输出一个长度为 10 的一维向量。

Now to break up model definition line by line:现在逐行分解模型定义:

model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),

The nn.Sequential part simply defines a network, each argument that is input defines a new layer in that network in that order. nn.Sequential部分简单地定义了一个网络,输入的每个参数都按该顺序定义了该网络中的一个新层。

nn.Linear(input_size, hidden_sizes[0]) is an example of such a layer. nn.Linear(input_size, hidden_sizes[0])就是这种层的一个例子。 It is the first layer of our network takes in an input of size input_size , and outputs a vector of size hidden_sizes[0] .它是我们网络的第一层,接收大小为input_size的输入,并输出大小为hidden_sizes[0]的向量。 The size of the output is considered "hidden" in that it is not the input or the output of the whole network.输出的大小被认为是“隐藏的”,因为它不是整个网络的输入或输出。 It "hidden" because it's located inside of the network far from the input and output ends of the network that you interact with when you actually use it.它“隐藏”是因为它位于网络内部,远离您实际使用时与之交互的网络的输入和输出端。

This is called Linear because it applies a linear transformation by multiplying the input by its weights matrix and adding its bias matrix to the result.这被称为Linear因为它通过将输入乘以其权重矩阵并将其偏差矩阵添加到结果来应用线性变换。 (Y = Ax + b, Y = output, x = input, A = weights, b = bias). (Y = Ax + b,Y = 输出,x = 输入,A = 权重,b = 偏差)。

nn.ReLU(),

ReLU is an example of an activation function. ReLU 是激活函数的一个例子。 What this function does is apply some sort of transformation to the output of the last layer (the layer discussed above), and outputs the result of that transformation.这个函数所做的是对最后一层(上面讨论的层)的输出应用某种转换,并输出该转换的结果。 In this case the function being used is the ReLU function, which is defined as ReLU(x) = max(x, 0) .在这种情况下,所使用的函数是ReLU函数,其定义为ReLU(x) = max(x, 0) Activation functions are used in neural networks because they create non-linearities.神经网络中使用激活函数是因为它们会产生非线性。 This allows your model to model non-linear relationships.这允许您的模型对非线性关系进行建模。

nn.Linear(hidden_sizes[0], hidden_sizes[1]),

From what we discussed above, this is a another example of a layer .从我们上面讨论的内容来看,这是layer的另一个例子。 It takes an input of hidden_sizes[0] (same shape as the output of the last layer) and outputs a 1D vector of length hidden_sizes[1] .它接受hidden_sizes[0]的输入(与最后一层的输出形状相同)并输出长度为hidden_sizes[1]一维向量。

nn.ReLU(),

Apples the ReLU function again.再次苹果ReLU函数。

nn.Linear(hidden_sizes[1], output_size)

Same as the above two layers, but our output shape is the output_size this time.和上面两层一样,但是这次我们的输出形状是output_size

nn.Softmax(dim=1))

Another activation function.另一个激活函数。 This activation function turns the logits outputted by nn.Linear into an actual probability distribution.该激活函数将 nn.Linear 输出的nn.Linear转换为实际的概率分布。 This lets the model output the probability for each class.这让模型输出每个类别的概率。 At this point our model is built.至此,我们的模型就建立好了。

# Forward pass through the network and display output
images, labels = next(iter(trainloader))
images.resize_(images.shape[0], 1, 784)
print(images.shape)

These are simply just preprocessing training data and putting it into the correct format这些只是预处理训练数据并将其放入正确的格式

ps = model.forward(images[0,:])

This passes the images through the model (forward pass) and applies the operations previously discussed in layer.这将图像通过模型(前向传递)并应用先前在层中讨论的操作。 You get the resultant output.你得到结果输出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM