简体   繁体   English

理解 softmax 输出层的目标数据

[英]Understanding target data for softmax output layer

I found some example code for a MNIST hand written character classification problem.我找到了一些 MNIST 手写字符分类问题的示例代码。 The start of the code is as follows:代码开头如下:

import tensorflow as tf

# Load in the data
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
print("x_train.shape:", x_train.shape)

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])
# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
# Train the model
r = model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=10)

Looking at the code it appears that the output layer of the network consists of ten nodes.查看代码似乎网络的输出层由十个节点组成。 If the network was working perfectly after training then (the appropriate) one of the ten outputs would have an activation very close to one and the rest should have activations very close to zero.如果网络在训练后运行良好,那么(适当的)十个输出之一的激活值将非常接近于 1,其余的激活值应该非常接近于零。

I knew that the training set contained 60000 example patterns.我知道训练集包含 60000 个示例模式。 I assumed that the target output data (y_train) would therefore be a 2D numpy array with a shape of 60000x10.因此,我假设目标输出数据 (y_train) 将是一个形状为 60000x10 的 2D numpy 数组。 I decided to double check and executed print(y_train.shape) and was very surprised to see it say (60000,) ... Normally you would expect to see the size of the target patterns would be the same as the number of nodes in the output layer.我决定仔细检查并执行print(y_train.shape)并且很惊讶地看到它说(60000,) ...通常你会期望看到目标模式的大小与节点的数量相同输出层。 I thought to myself, "OK, well obviously softmax is an unusual special case were we only need one target"... My next thought was - how could I have known this from any documentation?... so far I have failed to find anything.我心想,“好吧,很明显,softmax 是一个不寻常的特例,我们只需要一个目标”......我的下一个想法是 - 我怎么能从任何文档中知道这一点?......到目前为止我还没有找到任何东西。

I think you were searching in the wrong direction.我认为你在错误的方向搜索。 It's not because of the softmax.这不是因为 softmax。 Softmax function (not layer) receives n values and produces n values. Softmax 函数(不是层)接收 n 个值并产生 n 个值。 It's because of the sparse_categorical_crossentropy loss.这是因为sparse_categorical_crossentropy损失。

In the official document you can check that you are supposed to give target values as label integers.官方文档中,您可以检查是否应该将目标值作为标签整数给出。 You can also see that there is a exact same loss that uses shape of (60000,10) as target values which is CategoricalCrossentropy loss.您还可以看到,有一个完全相同的损失,它使用(60000,10)形状作为目标值,即 CategoricalCrossentropy 损失。

You choose which loss to use depending on your provided data format.您可以根据提供的数据格式选择要使用的损失。 Since MNIST data is labeled as integers instead of one-hot encoding, the tutorial uses SparseCategoricalCrossentropy loss.由于 MNIST 数据被标记为整数而不是单热编码,因此本教程使用 SparseCategoricalCrossentropy 损失。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 了解何时以及何时不使用 Softmax 作为 output 层激活 - Understanding when to and when not to use Softmax as output layer activation Keras自定义softmax层:是否可以基于零作为输入层中的数据在softmax层的输出中将输出神经元设置为0? - Keras custom softmax layer: Is it possible to have output neurons set to 0 in the output of a softmax layer based on zeros as data in an input layer? 在隐藏层中使用 softmax,在 output 层中使用 relu 进行 CNN 回归 - Using softmax in hidden layer and relu in output layer for CNN regression 得到最后一层的softmax output和raw output a model - Get softmax output and raw output of the last layer of a model 从tf.distributions.category输出层创建softmax - Creating softmax from a tf.distributions.Categorical output layer Keras 的 model.predict() 以带有 softmax 激活层的二进制形式给出 output - Keras' model.predict() give an output in binary with softmax activation layer Softmax输出层预测错误的二进制分类神经网络 - Binary Classification NN with Softmax Output Layer Predicting Incorrectly DQN理解输入和output(层) - DQN understanding input and output (layer) 密集层中的多个Softmax - Multiple Softmax in Dense Layer 如何用max代替tensorflow softmax在神经网络的输出层生成一个热矢量? - How to replace tensorflow softmax with max for generating one hot vector at the output layer of Neural Network?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM