实现二元交叉熵损失给出了与 Tensorflow 不同的答案

Question

I am implementing the Binary Cross-Entropy loss function with Raw python but it gives me a very different answer than Tensorflow.我正在使用原始 python 实现二进制交叉熵损失 function，但它给了我与 Tensorflow 截然不同的答案。 This is the answer I got from Tensorflow:-这是我从 Tensorflow 得到的答案：-

import numpy as np
from tensorflow.keras.losses import BinaryCrossentropy

y_true = np.array([1., 1., 1.])
y_pred = np.array([1., 1., 0.])
bce = BinaryCrossentropy()
loss = bce(y_true, y_pred)
print(loss.numpy())

Output: Output：

>>> 5.1416497230529785

From my Knowledge, the formula of Binary Cross entropy is this:据我所知，二元交叉熵的公式是这样的：

I implemented the same with raw python as follows:我用原始 python 实现了相同的功能，如下所示：

def BinaryCrossEntropy(y_true, y_pred):
    m = y_true.shape[1]
    y_pred = np.clip(y_pred, 1e-7, 1 - 1e-7)
    # Calculating loss
    loss = -1/m * (np.dot(y_true.T, np.log(y_pred)) + np.dot((1 - y_true).T, np.log(1 - y_pred)))

    return loss

print(BinaryCrossEntropy(np.array([1, 1, 1]).reshape(-1, 1), np.array([1, 1, 0]).reshape(-1, 1)))

But from this function I get loss value to be:但是从这个 function 我得到的损失值是：

>>> [[16.11809585]]

How can I get the right answer?我怎样才能得到正确的答案？

Answer 1

In the constructor of tf.keras.losses.BinaryCrossentropy() , you'll notice,在tf.keras.losses.BinaryCrossentropy()的构造函数中，您会注意到，

tf.keras.losses.BinaryCrossentropy(
    from_logits=False, label_smoothing=0, reduction=losses_utils.ReductionV2.AUTO,
    name='binary_crossentropy'
)

The default argument reduction will most probably have the value Reduction.SUM_OVER_BATCH_SIZE , as mentioned here .如前所述，默认参数reduction很可能具有值Reduction.SUM_OVER_BATCH_SIZE 。 Assume that the shape of our model outputs is [ 1, 3 ] .假设我们的 model 输出的形状是[ 1, 3 ] 。 Meaning, our batch size is 1 and the output dims is 3 ( This does not imply that there are 3 classes ).意思是，我们的批量大小是 1，output dims 是 3（这并不意味着有 3 个类）。 We need to compute the mean over the 0th axis ie the batch dimension.我们需要计算第 0 轴上的平均值，即批量维度。

I'll make it clear with the code,我会用代码说清楚，

import tensorflow as tf
import numpy as np

y_true = np.array( [1., 1., 1.] ).reshape( 1 , 3 )
y_pred = np.array( [1., 1., 0.] ).reshape( 1 , 3 )

bce = tf.keras.losses.BinaryCrossentropy( from_logits=False , reduction=tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE )
loss = bce( y_true, y_pred )

print(loss.numpy())

The output is, output 是，

5.1416497230529785

The expression for Binary Crossentropy is the same as mentioned in the question.二元交叉熵的表达式与问题中提到的相同。 N refers to the batch size. N 是指批量大小。

We now implement BCE on our own.我们现在自己实现 BCE。 First, we clip the outputs of our model, setting max to tf.keras.backend.epsilon() and min to 1 - tf.keras.backend.epsilon() .首先，我们裁剪 model 的输出，将max设置为tf.keras.backend.epsilon()并将min设置为1 - tf.keras.backend.epsilon() 。 The value of tf.keras.backend.epsilon() is 1e-7. tf.keras.backend.epsilon()的值为 1e-7。

y_pred = np.clip( y_pred , tf.keras.backend.epsilon() , 1 - tf.keras.backend.epsilon() )

Using the expression for BCE,使用 BCE 的表达式，

p1 = y_true * np.log( y_pred + tf.keras.backend.epsilon() )
p2 = ( 1 - y_true ) * np.log( 1 - y_pred + tf.keras.backend.epsilon() )

print( p1 )
print( p2 )

The output, output，

[[  0.           0.         -15.42494847]]
[[-0. -0.  0.]]

Notice that the shapes are still preserved.请注意，形状仍然保留。 A np.dot will turn them into a array of two elements ie of shape [ 1, 2 ] ( As in your implementation ).一个np.dot会将它们变成一个由两个元素组成的数组，即形状为[ 1, 2 ] （与您的实现一样）。

Finally, we add them and compute their mean using np.mean() over the batch dimension,最后，我们将它们相加并使用np.mean()在批处理维度上计算它们的平均值，

o  = -np.mean( p1 + p2 )
print( o )

The output is, output 是，

5.141649490132791

You can check the problem in your implementation by printing the shape of each of the terms.您可以通过打印每个术语的shape来检查实现中的问题。

Answer 2

There's some issue with your implementation.您的实施存在一些问题。 Here is the correct one with numpy .这是正确的numpy 。

def BinaryCrossEntropy(y_true, y_pred):
    y_pred = np.clip(y_pred, 1e-7, 1 - 1e-7)
    term_0 = (1-y_true) * np.log(1-y_pred + 1e-7)
    term_1 = y_true * np.log(y_pred + 1e-7)
    return -np.mean(term_0+term_1, axis=0)

print(BinaryCrossEntropy(np.array([1, 1, 1]).reshape(-1, 1), 
                         np.array([1, 1, 0]).reshape(-1, 1)))
[5.14164949]

Note, during the tf. keras注意，在tf. keras tf. keras model training, it's better to use keras backend functionality. tf. keras model 训练，最好使用keras后端功能。 You can implement it, in the same way, using the keras backend utilities.您可以使用keras后端实用程序以同样的方式实现它。

def BinaryCrossEntropy(y_true, y_pred): 
    y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
    term_0 = (1 - y_true) * K.log(1 - y_pred + K.epsilon())  
    term_1 = y_true * K.log(y_pred + K.epsilon())
    return -K.mean(term_0 + term_1, axis=0)

print(BinaryCrossEntropy(
    np.array([1., 1., 1.]).reshape(-1, 1), 
    np.array([1., 1., 0.]).reshape(-1, 1)
    ).numpy())
[5.14164949]

实现二元交叉熵损失给出了与 Tensorflow 不同的答案

问题描述

2 个解决方案

解决方案1
1 2021-05-20 08:37:01

解决方案2
0 已采纳 2021-05-20 08:10:52

实现二元交叉熵损失给出了与 Tensorflow 不同的答案

问题描述

2 个解决方案

解决方案1 1 2021-05-20 08:37:01

解决方案2 0 已采纳 2021-05-20 08:10:52

解决方案1
1 2021-05-20 08:37:01

解决方案2
0 已采纳 2021-05-20 08:10:52