Keras：binary_crossentropy 和 categorical_crossentropy 混淆

Question

After using TensorFlow for quite a while I have read some Keras tutorials and implemented some examples.在使用 TensorFlow 一段时间后，我阅读了一些 Keras 教程并实现了一些示例。 I have found several tutorials for convolutional autoencoders that use keras.losses.binary_crossentropy as the loss function.我找到了几个使用keras.losses.binary_crossentropy作为损失函数的卷积自编码器教程。

I thought binary_crossentropy should not be a multi-class loss function and would most likely use binary labels, but in fact Keras (TF Python backend) calls tf.nn.sigmoid_cross_entropy_with_logits , which actually is intended for classification tasks with multiple, independent classes that are not mutually exclusive.我认为binary_crossentropy不应该是多类损失函数，并且很可能会使用二进制标签，但实际上 Keras（TF Python 后端）调用tf.nn.sigmoid_cross_entropy_with_logits ，它实际上用于具有多个独立类的分类任务，这些类是不相互排斥。

On the other hand, my expectation for categorical_crossentropy was to be intended for multi-class classifications where target classes have a dependency on each other, but are not necessarily one-hot encoded.在另一方面，我的期望categorical_crossentropy是要适用于多级分类，其中目标类对彼此的依赖，但并不一定是一位热编码。

However, the Keras documentation states:但是，Keras 文档指出：

(...) when using the categorical_crossentropy loss, your targets should be in categorical format (eg if you have 10 classes, the target for each sample should be a 10-dimensional vector that is all-zeros expect for a 1 at the index corresponding to the class of the sample ). (...) 当使用 categorical_crossentropy 损失时，你的目标应该是分类格式（例如，如果你有 10 个类，每个样本的目标应该是一个 10 维向量，它是全零，期望索引处为 1对应于样本的类别）。

If I am not mistaken, this is just the special case of one-hot encoded classification tasks, but the underlying cross-entropy loss also works with probability distributions ("multi-class", dependent labels)?如果我没记错的话，这只是 one-hot 编码分类任务的特例，但潜在的交叉熵损失也适用于概率分布（“多类”，相关标签）？

Additionally, Keras uses tf.nn.softmax_cross_entropy_with_logits (TF python backend) for the implementation, which itself states :此外， tf.nn.softmax_cross_entropy_with_logits使用tf.nn.softmax_cross_entropy_with_logits （TF python 后端）进行实现，它本身指出：

NOTE: While the classes are mutually exclusive, their probabilities need not be .注意：虽然这些类是互斥的，但它们的概率不必是。 All that is required is that each row of labels is a valid probability distribution.所需要的只是每一行标签都是一个有效的概率分布。 If they are not, the computation of the gradient will be incorrect.如果不是，梯度的计算将是不正确的。

Please correct me if I am wrong, but it looks to me that the Keras documentation is - at least - not very "detailed"?!如果我错了，请纠正我，但在我看来，Keras 文档 - 至少 - 不是很“详细”？！

So, what is the idea behind Keras' naming of the loss functions?那么，Keras 命名损失函数背后的想法是什么？ Is the documentation correct?文档是否正确？ If the binary cross entropy would really rely on binary labels, it should not work for autoencoders, right?!如果二进制交叉熵真的依赖于二进制标签，它应该不适用于自动编码器，对吧？！ Likewise the categorical crossentropy: should only work for one-hot encoded labels if the documentation is correct?!同样，分类交叉熵：如果文档正确，应该只适用于单热编码标签吗？！

Answer 1

You are right by defining areas where each of these losses are applicable:定义每个损失适用的区域是正确的：

binary_crossentropy (and tf.nn.sigmoid_cross_entropy_with_logits under the hood) is for binary multi-label classification (labels are independent). binary_crossentropy （以及tf.nn.sigmoid_cross_entropy_with_logits ）用于二元多标签分类（标签是独立的）。
categorical_crossentropy (and tf.nn.softmax_cross_entropy_with_logits under the hood) is for multi-class classification (classes are exclusive). categorical_crossentropy （以及tf.nn.softmax_cross_entropy_with_logits ）用于多类分类（类是独占的）。

See also the detailed analysis in this question .另请参阅此问题中的详细分析。

I'm not sure what tutorials you mean, so can't comment whether binary_crossentropy is a good or bad choice for autoencoders.我不确定你的意思是什么教程，所以不能评论binary_crossentropy是自动编码器的好还是坏选择。

As for the naming, it is absolutely correct and reasonable.至于命名，是绝对正确和合理的。 Or do you think sigmoid and softmax names sound better?或者您认为sigmoid和softmax名称听起来更好吗？

So the only confusion left in your question is the categorical_crossentropy documentation.因此，您的问题中唯一令人困惑的是categorical_crossentropy文档。 Note that everything that has been stated is correct: the loss supports one-hot representation.请注意，所陈述的一切都是正确的：损失支持单热表示。 This function indeed works with any probability distribution for labels (in addition to one-hot vectors) in case of tensorflow backend and it could be included into the doc, but this doesn't look critical to me.在 tensorflow 后端的情况下，此函数确实适用于标签的任何概率分布（除了单热向量之外），并且它可以包含在文档中，但这对我来说并不重要。 Moreover, need to check if soft classes are supported in other backends, theano and CNTK.此外，需要检查其他后端是否支持软类，theano 和 CNTK。 Remember that keras tries to be minimalistic and targets for most popular use cases, so I can understand the logic here.请记住，keras 试图做到简约并针对大多数流行用例，所以我可以理解这里的逻辑。

Answer 2

Not sure if this answers your question, but for softmax loss the output layer needs to be a probability distribution (ie sum to 1), for binary crossentropy loss it doesn't.不确定这是否能回答您的问题，但对于 softmax 损失，输出层需要是概率分布（即总和为 1），而对于二元交叉熵损失则不是。 Simple as that.就那么简单。 (Binary doesn't mean that there are only 2 output classes, it just means that each output is binary.) （二进制并不意味着只有 2 个输出类，它只是意味着每个输出都是二进制的。）

Answer 3

The documentation doesn't mention that BinaryCrossentropy can be used for multi-label classification and that can be confusing.该文档没有提到BinaryCrossentropy可用于多标签分类，这可能会令人困惑。 But it can also be used for a binary classifier (when we have only 2 exclusive classes like cats and dogs) - see classical example .但它也可以用于二元分类器（当我们只有 2 个独占类时，如猫和狗） - 参见经典示例。 But in this case we have to set n_classes=1 :但在这种情况下，我们必须设置n_classes=1 ：

tf.keras.layers.Dense(units=1)

Also BinaryCrossentropy and tf.keras.losses.binary_crossentropy have different behavior. BinaryCrossentropy和tf.keras.losses.binary_crossentropy也有不同的行为。

Let's look at the example from the documentation to prove that it is actually for multi-label classification.让我们看一下文档中的例子，证明它实际上是用于多标签分类的。

y_true = tf.convert_to_tensor([[0, 1], [0, 0]])
y_pred = tf.convert_to_tensor([[0.6, 0.4], [0.4, 0.6]])

bce = tf.keras.losses.BinaryCrossentropy()
loss1 = bce(y_true=y_true, y_pred=y_pred)
# <tf.Tensor: shape=(), dtype=float32, numpy=0.81492424>

loss2 = tf.keras.losses.binary_crossentropy(y_true, y_pred)
# <tf.Tensor: shape=(2,), dtype=float32, numpy=array([0.9162905 , 0.71355796], dtype=float32)>

np.mean(loss2.numpy())
# 0.81492424

scce = tf.keras.losses.SparseCategoricalCrossentropy()
y_true = tf.convert_to_tensor([0, 0])
scce(y_true, y_pred)
# <tf.Tensor: shape=(), dtype=float32, numpy=0.71355814>
y_true = tf.convert_to_tensor([1, 0])
scce(y_true, y_pred)
# <tf.Tensor: shape=(), dtype=float32, numpy=0.9162907>

Keras：binary_crossentropy 和 categorical_crossentropy 混淆

问题描述

3 个解决方案

解决方案1
8 已采纳 2017-12-19 14:50:54

解决方案2
1 2017-12-18 22:18:01

解决方案3
0 2020-06-24 14:17:04

Keras：binary_crossentropy 和 categorical_crossentropy 混淆

问题描述

3 个解决方案

解决方案1 8 已采纳 2017-12-19 14:50:54

解决方案2 1 2017-12-18 22:18:01

解决方案3 0 2020-06-24 14:17:04

解决方案1
8 已采纳 2017-12-19 14:50:54

解决方案2
1 2017-12-18 22:18:01

解决方案3
0 2020-06-24 14:17:04