[英]Understanding target data for softmax output layer
I found some example code for a MNIST hand written character classification problem.我找到了一些 MNIST 手写字符分类问题的示例代码。 The start of the code is as follows:
代码开头如下:
import tensorflow as tf
# Load in the data
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
print("x_train.shape:", x_train.shape)
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
# Compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
r = model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=10)
Looking at the code it appears that the output layer of the network consists of ten nodes.查看代码似乎网络的输出层由十个节点组成。 If the network was working perfectly after training then (the appropriate) one of the ten outputs would have an activation very close to one and the rest should have activations very close to zero.
如果网络在训练后运行良好,那么(适当的)十个输出之一的激活值将非常接近于 1,其余的激活值应该非常接近于零。
I knew that the training set contained 60000 example patterns.我知道训练集包含 60000 个示例模式。 I assumed that the target output data (y_train) would therefore be a 2D numpy array with a shape of 60000x10.
因此,我假设目标输出数据 (y_train) 将是一个形状为 60000x10 的 2D numpy 数组。 I decided to double check and executed
print(y_train.shape)
and was very surprised to see it say (60000,)
... Normally you would expect to see the size of the target patterns would be the same as the number of nodes in the output layer.我决定仔细检查并执行
print(y_train.shape)
并且很惊讶地看到它说(60000,)
...通常你会期望看到目标模式的大小与节点的数量相同输出层。 I thought to myself, "OK, well obviously softmax is an unusual special case were we only need one target"... My next thought was - how could I have known this from any documentation?... so far I have failed to find anything.我心想,“好吧,很明显,softmax 是一个不寻常的特例,我们只需要一个目标”......我的下一个想法是 - 我怎么能从任何文档中知道这一点?......到目前为止我还没有找到任何东西。
I think you were searching in the wrong direction.我认为你在错误的方向搜索。 It's not because of the softmax.
这不是因为 softmax。 Softmax function (not layer) receives n values and produces n values.
Softmax 函数(不是层)接收 n 个值并产生 n 个值。 It's because of the
sparse_categorical_crossentropy
loss.这是因为
sparse_categorical_crossentropy
损失。
In the official document you can check that you are supposed to give target values as label integers.在官方文档中,您可以检查是否应该将目标值作为标签整数给出。 You can also see that there is a exact same loss that uses shape of
(60000,10)
as target values which is CategoricalCrossentropy loss.您还可以看到,有一个完全相同的损失,它使用
(60000,10)
形状作为目标值,即 CategoricalCrossentropy 损失。
You choose which loss to use depending on your provided data format.您可以根据提供的数据格式选择要使用的损失。 Since MNIST data is labeled as integers instead of one-hot encoding, the tutorial uses SparseCategoricalCrossentropy loss.
由于 MNIST 数据被标记为整数而不是单热编码,因此本教程使用 SparseCategoricalCrossentropy 损失。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.