为什么Keras / tensorflow的sigmoid和crossentropy具有低精度？

Question

I have the following simple neural network (with 1 neuron only) to test the computation precision of sigmoid activation & binary_crossentropy of Keras: 我有以下简单的神经网络（仅1个神经元）来测试binary_crossentropy的sigmoid激活和binary_crossentropy的计算精度：

model = Sequential()
model.add(Dense(1, input_dim=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

To simplify the test, I manually set the only weight to 1 and bias to 0, and then evaluate the model with 2-point training set {(-a, 0), (a, 1)} , ie 为了简化测试，我手动将唯一权重设置为1并偏向0，然后使用2点训练集{(-a, 0), (a, 1)}评估模型，即

y = numpy.array([0, 1])
for a in range(40):
    x = numpy.array([-a, a])
    keras_ce[a] = model.evaluate(x, y)[0] # cross-entropy computed by keras/tensorflow
    my_ce[a] = np.log(1+exp(-a)) # My own computation

My Question: I found the binary crossentropy ( keras_ce ) computed by Keras/Tensorflow reach a floor of 1.09e-7 when a is approx. 我的问题：我发现Keras / Tensorflow计算的二进制交叉熵（ keras_ce ）在a约为1.09e-7时达到1.09e-7的1.09e-7值。 16, as illustrated below (blue line). 16，如下图所示（蓝线）。 It doesn't decrease further as 'a' keeps growing. 随着'a'不断增长，它不会进一步减少。 Why is that? 这是为什么？

This neural network has 1 neuron only whose weight is set to 1 and bias is 0. With the 2-point training set {(-a, 0), (a, 1)} , the binary_crossentropy is just 这个神经网络只有1个神经元，其权重设置为1，偏差为0.使用2点训练集{(-a, 0), (a, 1)} ， binary_crossentropy只是

-1/2 [ log(1 - 1/(1+exp(a)) ) + log( 1/(1+exp(-a)) ) ] = log(1+exp(-a)) -1/2 [log（1 - 1 /（1 + exp（a）））+ log（1 /（1 + exp（-a）））] = log（1 + exp（-a））

So the cross-entropy should decrease as a increases, as illustrated in orange ('my') above. 因此，交叉熵应该减少为a （“我的”）以上的增加，如在橙色所示。 Is there some Keras/Tensorflow/Python setup I can change to increase its precision? 是否有一些Keras / Tensorflow / Python设置我可以更改以提高其精度？ Or am I mistaken somewhere? 或者我错了？ I'd appreciate any suggestions/comments/answers. 我很感激任何建议/意见/答案。

Answer 1

TL;DR version: the probability values (ie the outputs of sigmoid function) are clipped due to numerical stability when computing the loss function. TL; DR版本：在计算损失函数时，由于数值稳定性，概率值（即S形函数的输出）被削减。

If you inspect the source code, you would find that using binary_crossentropy as the loss would result in a call to binary_crossentropy function in losses.py file: 如果检查源代码，您会发现使用binary_crossentropy作为丢失将导致在loss.py文件中调用binary_crossentropy函数：

def binary_crossentropy(y_true, y_pred):
    return K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)

which in turn, as you can see, calls the equivalent backend function. 反过来，正如您所看到的，调用等效的后端函数。 In case of using Tensorflow as the backend, that would result in a call to binary_crossentropy function in tensorflow_backend.py file: 如果使用Tensorflow作为后端，则会导致在tensorflow_backend.py文件中调用binary_crossentropy函数：

def binary_crossentropy(target, output, from_logits=False):
    """ Docstring ..."""

    # Note: tf.nn.sigmoid_cross_entropy_with_logits
    # expects logits, Keras expects probabilities.
    if not from_logits:
        # transform back to logits
        _epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
        output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
        output = tf.log(output / (1 - output))

    return tf.nn.sigmoid_cross_entropy_with_logits(labels=target,
                                                   logits=output)

As you can see from_logits argument is set to False by default. 正如您所看到的，默认情况下from_logits参数设置为False 。 Therefore, the if condition evaluates to true and as a result the values in the output are clipped to the range [epsilon, 1-epislon] . 因此，if条件的计算结果为true，结果输出中的值被限制在[epsilon, 1-epislon]范围内。 That's why no matter how small or large a probability is, it could not be smaller than epsilon and greater than 1-epsilon . 这就是为什么无论概率有多小或多大，它都不能小于epsilon且大于1-epsilon 。 And that explains why the output of binary_crossentropy loss is also bounded. 这就解释了为什么binary_crossentropy损失的输出也是有限的。

Now, what is this epsilon here? 现在，这个ε在这里是什么？ It is a very small constant which is used for numerical stability (eg prevent division by zero or undefined behaviors, etc.). 它是一个非常小的常数，用于数值稳定性（例如，防止零除或未定义的行为等）。 To find out its value you can further inspect the source code and you would find it in the common.py file: 要找出它的值，你可以进一步检查源代码，你会在common.py文件中找到它：

_EPSILON = 1e-7

def epsilon():
    """Returns the value of the fuzz factor used in numeric expressions.
    # Returns
        A float.
    # Example
    ```python
        >>> keras.backend.epsilon()
        1e-07
    ```
    """
    return _EPSILON

If for any reason, you would like more precision you can alternatively set the epsilon value to a smaller constant using set_epsilon function from the backend: 如果出于任何原因，您希望获得更高的精度，您可以使用后端的set_epsilon函数将epsilon值设置为更小的常量：

def set_epsilon(e):
    """Sets the value of the fuzz factor used in numeric expressions.
    # Arguments
        e: float. New value of epsilon.
    # Example
    ```python
        >>> from keras import backend as K
        >>> K.epsilon()
        1e-07
        >>> K.set_epsilon(1e-05)
        >>> K.epsilon()
        1e-05
    ```
    """
    global _EPSILON
    _EPSILON = e

However, be aware that setting epsilon to an extremely low positive value or zero, may disrupt the stability of computations all over the Keras. 但是，请注意，将epsilon设置为极低的正值或零，可能会破坏整个Keras计算的稳定性。

Answer 2

I think that keras take into account numerical stability , Let's track how keras caculate 我认为keras考虑到数值稳定性 ，让我们跟踪keras如何计算

First, 第一，

def binary_crossentropy(y_true, y_pred):
    return K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)

Then, 然后，

def binary_crossentropy(target, output, from_logits=False):
    """Binary crossentropy between an output tensor and a target tensor.

    # Arguments
        target: A tensor with the same shape as `output`.
        output: A tensor.
        from_logits: Whether `output` is expected to be a logits tensor.
            By default, we consider that `output`
            encodes a probability distribution.

    # Returns
        A tensor.
    """
    # Note: tf.nn.sigmoid_cross_entropy_with_logits
    # expects logits, Keras expects probabilities.
    if not from_logits:
        # transform back to logits
        _epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
        output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
        output = tf.log(output / (1 - output))


    return tf.nn.sigmoid_cross_entropy_with_logits(labels=target,
                                                   logits=output)

Notice tf.clip_by_value is used for numerical stability 注意tf.clip_by_value用于数值稳定性

Let's compare keras binary_crossentropy , tensorflow tf.nn.sigmoid_cross_entropy_with_logits and custom loss function(eleminate vale clipping) 让我们比较keras binary_crossentropy ，tensorflow tf.nn.sigmoid_cross_entropy_with_logits和自定义丢失函数（ tf.nn.sigmoid_cross_entropy_with_logits vale clipping）

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense
import keras

# keras
model = Sequential()
model.add(Dense(units=1, activation='sigmoid', input_shape=(
    1,), weights=[np.ones((1, 1)), np.zeros(1)]))
# print(model.get_weights())
model.compile(loss='binary_crossentropy',
              optimizer='adam', metrics=['accuracy'])

# tensorflow
G = tf.Graph()
with G.as_default():
    x_holder = tf.placeholder(dtype=tf.float32, shape=(2,))
    y_holder = tf.placeholder(dtype=tf.float32, shape=(2,))
    entropy = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
        logits=x_holder, labels=y_holder))
sess = tf.Session(graph=G)


# keras with custom loss function
def customLoss(target, output):
    # if not from_logits:
    #     # transform back to logits
    #     _epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
    #     output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
    #     output = tf.log(output / (1 - output))
    output = tf.log(output / (1 - output))
    return tf.nn.sigmoid_cross_entropy_with_logits(labels=target,
                                                   logits=output)
model_m = Sequential()
model_m.add(Dense(units=1, activation='sigmoid', input_shape=(
    1,), weights=[np.ones((1, 1)), np.zeros(1)]))
# print(model.get_weights())
model_m.compile(loss=customLoss,
                optimizer='adam', metrics=['accuracy'])


N = 100
xaxis = np.linspace(10, 20, N)
keras_ce = np.zeros(N)
tf_ce = np.zeros(N)
my_ce = np.zeros(N)
keras_custom = np.zeros(N)

y = np.array([0, 1])
for i, a in enumerate(xaxis):
    x = np.array([-a, a])
    # cross-entropy computed by keras/tensorflow
    keras_ce[i] = model.evaluate(x, y)[0]
    my_ce[i] = np.log(1+np.exp(-a))  # My own computation
    tf_ce[i] = sess.run(entropy, feed_dict={x_holder: x, y_holder: y})
    keras_custom[i] = model_m.evaluate(x, y)[0]
# print(model.get_weights())

plt.plot(xaxis, keras_ce, label='keras')
plt.plot(xaxis, my_ce, 'b',  label='my_ce')
plt.plot(xaxis, tf_ce, 'r:', linewidth=5, label='tensorflow')
plt.plot(xaxis, keras_custom, '--', label='custom loss')
plt.xlabel('a')
plt.ylabel('xentropy')
plt.yscale('log')
plt.legend()
plt.savefig('compare.jpg')
plt.show()

we can see that tensorflow is same with manual computing, but keras with custom loss encounter numeric overflow as expected. 我们可以看到张量流与手动计算相同，但是具有自定义丢失的keras会遇到预期的数值溢出。

为什么Keras / tensorflow的sigmoid和crossentropy具有低精度？

问题描述

2 个解决方案

解决方案1
4 已采纳 2018-09-01 08:58:04

解决方案2
2 2018-09-01 09:31:50

为什么Keras / tensorflow的sigmoid和crossentropy具有低精度？

问题描述

2 个解决方案

解决方案1 4 已采纳 2018-09-01 08:58:04

解决方案2 2 2018-09-01 09:31:50

解决方案1
4 已采纳 2018-09-01 08:58:04

解决方案2
2 2018-09-01 09:31:50