[英]Implementing Binary Cross Entropy loss gives different answer than Tensorflow's
I am implementing the Binary Cross-Entropy loss function with Raw python but it gives me a very different answer than Tensorflow.我正在使用原始 python 实现二进制交叉熵损失 function,但它给了我与 Tensorflow 截然不同的答案。 This is the answer I got from Tensorflow:-这是我从 Tensorflow 得到的答案:-
import numpy as np
from tensorflow.keras.losses import BinaryCrossentropy
y_true = np.array([1., 1., 1.])
y_pred = np.array([1., 1., 0.])
bce = BinaryCrossentropy()
loss = bce(y_true, y_pred)
print(loss.numpy())
Output: Output:
>>> 5.1416497230529785
From my Knowledge, the formula of Binary Cross entropy is this:据我所知,二元交叉熵的公式是这样的:
I implemented the same with raw python as follows:我用原始 python 实现了相同的功能,如下所示:
def BinaryCrossEntropy(y_true, y_pred):
m = y_true.shape[1]
y_pred = np.clip(y_pred, 1e-7, 1 - 1e-7)
# Calculating loss
loss = -1/m * (np.dot(y_true.T, np.log(y_pred)) + np.dot((1 - y_true).T, np.log(1 - y_pred)))
return loss
print(BinaryCrossEntropy(np.array([1, 1, 1]).reshape(-1, 1), np.array([1, 1, 0]).reshape(-1, 1)))
But from this function I get loss value to be:但是从这个 function 我得到的损失值是:
>>> [[16.11809585]]
How can I get the right answer?我怎样才能得到正确的答案?
In the constructor of tf.keras.losses.BinaryCrossentropy()
, you'll notice,在tf.keras.losses.BinaryCrossentropy()
的构造函数中,您会注意到,
tf.keras.losses.BinaryCrossentropy(
from_logits=False, label_smoothing=0, reduction=losses_utils.ReductionV2.AUTO,
name='binary_crossentropy'
)
The default argument reduction
will most probably have the value Reduction.SUM_OVER_BATCH_SIZE
, as mentioned here .如前所述,默认参数reduction
很可能具有值Reduction.SUM_OVER_BATCH_SIZE
。 Assume that the shape of our model outputs is [ 1, 3 ]
.假设我们的 model 输出的形状是[ 1, 3 ]
。 Meaning, our batch size is 1 and the output dims is 3 ( This does not imply that there are 3 classes ).意思是,我们的批量大小是 1,output dims 是 3(这并不意味着有 3 个类)。 We need to compute the mean over the 0th axis ie the batch dimension.我们需要计算第 0 轴上的平均值,即批量维度。
I'll make it clear with the code,我会用代码说清楚,
import tensorflow as tf
import numpy as np
y_true = np.array( [1., 1., 1.] ).reshape( 1 , 3 )
y_pred = np.array( [1., 1., 0.] ).reshape( 1 , 3 )
bce = tf.keras.losses.BinaryCrossentropy( from_logits=False , reduction=tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE )
loss = bce( y_true, y_pred )
print(loss.numpy())
The output is, output 是,
5.1416497230529785
The expression for Binary Crossentropy is the same as mentioned in the question.二元交叉熵的表达式与问题中提到的相同。 N refers to the batch size. N 是指批量大小。
We now implement BCE on our own.我们现在自己实现 BCE。 First, we clip the outputs of our model, setting max
to tf.keras.backend.epsilon()
and min
to 1 - tf.keras.backend.epsilon()
.首先,我们裁剪 model 的输出,将max
设置为tf.keras.backend.epsilon()
并将min
设置为1 - tf.keras.backend.epsilon()
。 The value of tf.keras.backend.epsilon()
is 1e-7. tf.keras.backend.epsilon()
的值为 1e-7。
y_pred = np.clip( y_pred , tf.keras.backend.epsilon() , 1 - tf.keras.backend.epsilon() )
Using the expression for BCE,使用 BCE 的表达式,
p1 = y_true * np.log( y_pred + tf.keras.backend.epsilon() )
p2 = ( 1 - y_true ) * np.log( 1 - y_pred + tf.keras.backend.epsilon() )
print( p1 )
print( p2 )
The output, output,
[[ 0. 0. -15.42494847]]
[[-0. -0. 0.]]
Notice that the shapes are still preserved.请注意,形状仍然保留。 A np.dot
will turn them into a array of two elements ie of shape [ 1, 2 ]
( As in your implementation ).一个np.dot
会将它们变成一个由两个元素组成的数组,即形状为[ 1, 2 ]
(与您的实现一样)。
Finally, we add them and compute their mean using np.mean()
over the batch dimension,最后,我们将它们相加并使用np.mean()
在批处理维度上计算它们的平均值,
o = -np.mean( p1 + p2 )
print( o )
The output is, output 是,
5.141649490132791
You can check the problem in your implementation by printing the shape
of each of the terms.您可以通过打印每个术语的shape
来检查实现中的问题。
There's some issue with your implementation.您的实施存在一些问题。 Here is the correct one with numpy
.这是正确的numpy
。
def BinaryCrossEntropy(y_true, y_pred):
y_pred = np.clip(y_pred, 1e-7, 1 - 1e-7)
term_0 = (1-y_true) * np.log(1-y_pred + 1e-7)
term_1 = y_true * np.log(y_pred + 1e-7)
return -np.mean(term_0+term_1, axis=0)
print(BinaryCrossEntropy(np.array([1, 1, 1]).reshape(-1, 1),
np.array([1, 1, 0]).reshape(-1, 1)))
[5.14164949]
Note, during the tf. keras
注意,在tf. keras
tf. keras
model training, it's better to use keras
backend functionality. tf. keras
model 训练,最好使用keras
后端功能。 You can implement it, in the same way, using the keras
backend utilities.您可以使用keras
后端实用程序以同样的方式实现它。
def BinaryCrossEntropy(y_true, y_pred):
y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
term_0 = (1 - y_true) * K.log(1 - y_pred + K.epsilon())
term_1 = y_true * K.log(y_pred + K.epsilon())
return -K.mean(term_0 + term_1, axis=0)
print(BinaryCrossEntropy(
np.array([1., 1., 1.]).reshape(-1, 1),
np.array([1., 1., 0.]).reshape(-1, 1)
).numpy())
[5.14164949]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.