如何在keras中实现categorical_crossentropy？

Question

I'm trying to apply the concept of distillation, basically to train a new smaller network to do the same as the original one but with less computation. 我正在尝试应用蒸馏的概念，基本上是为了训练一个新的小型网络与原始网络一样，但计算量较少。

I have the softmax outputs for every sample instead of the logits. 我有每个样本的softmax输出而不是logits。

My question is, how is the categorical cross entropy loss function implemented? 我的问题是，如何实现分类交叉熵损失函数？ Like it takes the maximum value of the original labels and multiply it with the corresponded predicted value in the same index, or it does the summation all over the logits (One Hot encoding) as the formula says: 就像它采用原始标签的最大值并将其与相同索引中的相应预测值相乘，或者它在整个logits（One Hot encoding）中的总和如公式所示：

Answer 1

I see that you used the tensorflow tag, so I guess this is the backend you are using? 我看到你使用了tensorflow标签，所以我猜这是你正在使用的后端？

def categorical_crossentropy(output, target, from_logits=False):
"""Categorical crossentropy between an output tensor and a target tensor.
# Arguments
    output: A tensor resulting from a softmax
        (unless `from_logits` is True, in which
        case `output` is expected to be the logits).
    target: A tensor of the same shape as `output`.
    from_logits: Boolean, whether `output` is the
        result of a softmax, or is a tensor of logits.
# Returns
    Output tensor.

This code comes from the keras source code . 此代码来自keras源代码。 Looking directly at the code should answer all your questions :) If you need more info just ask ! 直接查看代码应该回答所有问题:)如果您需要更多信息，请询问！

EDIT : 编辑：

Here is the code that interests you : 以下是您感兴趣的代码：

 # Note: tf.nn.softmax_cross_entropy_with_logits
# expects logits, Keras expects probabilities.
if not from_logits:
    # scale preds so that the class probas of each sample sum to 1
    output /= tf.reduce_sum(output,
                            reduction_indices=len(output.get_shape()) - 1,
                            keep_dims=True)
    # manual computation of crossentropy
    epsilon = _to_tensor(_EPSILON, output.dtype.base_dtype)
    output = tf.clip_by_value(output, epsilon, 1. - epsilon)
    return - tf.reduce_sum(target * tf.log(output),
                          reduction_indices=len(output.get_shape()) - 1)

If you look at the return, they sum it... :) 如果你看一下回报，他们总结一下...... :)

Answer 2

As an answer to "Do you happen to know what the epsilon and tf.clip_by_value is doing?", 作为回答“你碰巧知道epsilon和tf.clip_by_value正在做什么吗？”，
it is ensuring that output != 0 , because tf.log(0) returns a division by zero error. 它确保output != 0 ，因为tf.log(0)返回除零错误。
(I don't have points to comment but thought I'd contribute) （我没有评论意见，但我认为我会做出贡献）

如何在keras中实现categorical_crossentropy？

问题描述

2 个解决方案

解决方案1
5 已采纳 2017-05-29 19:55:14

解决方案2
2 2019-03-12 02:16:05

如何在keras中实现categorical_crossentropy？

问题描述

2 个解决方案

解决方案1 5 已采纳 2017-05-29 19:55:14

解决方案2 2 2019-03-12 02:16:05

解决方案1
5 已采纳 2017-05-29 19:55:14

解决方案2
2 2019-03-12 02:16:05