[英]Why does sigmoid & crossentropy of Keras/tensorflow have low precision?
I have the following simple neural network (with 1 neuron only) to test the computation precision of sigmoid
activation & binary_crossentropy
of Keras: 我有以下简单的神经网络(仅1个神经元)来测试
binary_crossentropy
的sigmoid
激活和binary_crossentropy
的计算精度:
model = Sequential()
model.add(Dense(1, input_dim=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
To simplify the test, I manually set the only weight to 1 and bias to 0, and then evaluate the model with 2-point training set {(-a, 0), (a, 1)}
, ie 为了简化测试,我手动将唯一权重设置为1并偏向0,然后使用2点训练集
{(-a, 0), (a, 1)}
评估模型,即
y = numpy.array([0, 1])
for a in range(40):
x = numpy.array([-a, a])
keras_ce[a] = model.evaluate(x, y)[0] # cross-entropy computed by keras/tensorflow
my_ce[a] = np.log(1+exp(-a)) # My own computation
My Question: I found the binary crossentropy ( keras_ce
) computed by Keras/Tensorflow reach a floor of 1.09e-7
when a
is approx. 我的问题:我发现Keras / Tensorflow计算的二进制交叉熵(
keras_ce
)在a
约为1.09e-7
时达到1.09e-7
的1.09e-7
值。 16, as illustrated below (blue line). 16,如下图所示(蓝线)。 It doesn't decrease further as 'a' keeps growing.
随着'a'不断增长,它不会进一步减少。 Why is that?
这是为什么?
This neural network has 1 neuron only whose weight is set to 1 and bias is 0. With the 2-point training set {(-a, 0), (a, 1)}
, the binary_crossentropy
is just 这个神经网络只有1个神经元,其权重设置为1,偏差为0.使用2点训练集
{(-a, 0), (a, 1)}
, binary_crossentropy
只是
-1/2 [ log(1 - 1/(1+exp(a)) ) + log( 1/(1+exp(-a)) ) ] = log(1+exp(-a)) -1/2 [log(1 - 1 /(1 + exp(a)))+ log(1 /(1 + exp(-a)))] = log(1 + exp(-a))
So the cross-entropy should decrease as a
increases, as illustrated in orange ('my') above. 因此,交叉熵应该减少为
a
(“我的”)以上的增加,如在橙色所示。 Is there some Keras/Tensorflow/Python setup I can change to increase its precision? 是否有一些Keras / Tensorflow / Python设置我可以更改以提高其精度? Or am I mistaken somewhere?
或者我错了? I'd appreciate any suggestions/comments/answers.
我很感激任何建议/意见/答案。
TL;DR version: the probability values (ie the outputs of sigmoid function) are clipped due to numerical stability when computing the loss function. TL; DR版本:在计算损失函数时,由于数值稳定性,概率值(即S形函数的输出)被削减。
If you inspect the source code, you would find that using binary_crossentropy
as the loss would result in a call to binary_crossentropy
function in losses.py file: 如果检查源代码,您会发现使用
binary_crossentropy
作为丢失将导致在loss.py文件中调用binary_crossentropy
函数:
def binary_crossentropy(y_true, y_pred):
return K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)
which in turn, as you can see, calls the equivalent backend function. 反过来,正如您所看到的,调用等效的后端函数。 In case of using Tensorflow as the backend, that would result in a call to
binary_crossentropy
function in tensorflow_backend.py file: 如果使用Tensorflow作为后端,则会导致在tensorflow_backend.py文件中调用
binary_crossentropy
函数:
def binary_crossentropy(target, output, from_logits=False):
""" Docstring ..."""
# Note: tf.nn.sigmoid_cross_entropy_with_logits
# expects logits, Keras expects probabilities.
if not from_logits:
# transform back to logits
_epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
output = tf.log(output / (1 - output))
return tf.nn.sigmoid_cross_entropy_with_logits(labels=target,
logits=output)
As you can see from_logits
argument is set to False
by default. 正如您所看到的,默认情况下
from_logits
参数设置为False
。 Therefore, the if condition evaluates to true and as a result the values in the output are clipped to the range [epsilon, 1-epislon]
. 因此,if条件的计算结果为true,结果输出中的值被限制在
[epsilon, 1-epislon]
范围内。 That's why no matter how small or large a probability is, it could not be smaller than epsilon
and greater than 1-epsilon
. 这就是为什么无论概率有多小或多大,它都不能小于
epsilon
且大于1-epsilon
。 And that explains why the output of binary_crossentropy
loss is also bounded. 这就解释了为什么
binary_crossentropy
损失的输出也是有限的。
Now, what is this epsilon here? 现在,这个ε在这里是什么? It is a very small constant which is used for numerical stability (eg prevent division by zero or undefined behaviors, etc.).
它是一个非常小的常数,用于数值稳定性(例如,防止零除或未定义的行为等)。 To find out its value you can further inspect the source code and you would find it in the common.py file:
要找出它的值,你可以进一步检查源代码,你会在common.py文件中找到它:
_EPSILON = 1e-7
def epsilon():
"""Returns the value of the fuzz factor used in numeric expressions.
# Returns
A float.
# Example
```python
>>> keras.backend.epsilon()
1e-07
```
"""
return _EPSILON
If for any reason, you would like more precision you can alternatively set the epsilon value to a smaller constant using set_epsilon
function from the backend: 如果出于任何原因,您希望获得更高的精度,您可以使用后端的
set_epsilon
函数将epsilon值设置为更小的常量:
def set_epsilon(e):
"""Sets the value of the fuzz factor used in numeric expressions.
# Arguments
e: float. New value of epsilon.
# Example
```python
>>> from keras import backend as K
>>> K.epsilon()
1e-07
>>> K.set_epsilon(1e-05)
>>> K.epsilon()
1e-05
```
"""
global _EPSILON
_EPSILON = e
However, be aware that setting epsilon to an extremely low positive value or zero, may disrupt the stability of computations all over the Keras. 但是,请注意,将epsilon设置为极低的正值或零,可能会破坏整个Keras计算的稳定性。
I think that keras
take into account numerical stability , Let's track how keras
caculate 我认为
keras
考虑到数值稳定性 ,让我们跟踪keras
如何计算
First, 第一,
def binary_crossentropy(y_true, y_pred):
return K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)
Then, 然后,
def binary_crossentropy(target, output, from_logits=False):
"""Binary crossentropy between an output tensor and a target tensor.
# Arguments
target: A tensor with the same shape as `output`.
output: A tensor.
from_logits: Whether `output` is expected to be a logits tensor.
By default, we consider that `output`
encodes a probability distribution.
# Returns
A tensor.
"""
# Note: tf.nn.sigmoid_cross_entropy_with_logits
# expects logits, Keras expects probabilities.
if not from_logits:
# transform back to logits
_epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
output = tf.log(output / (1 - output))
return tf.nn.sigmoid_cross_entropy_with_logits(labels=target,
logits=output)
Notice tf.clip_by_value
is used for numerical stability 注意
tf.clip_by_value
用于数值稳定性
Let's compare keras binary_crossentropy
, tensorflow tf.nn.sigmoid_cross_entropy_with_logits
and custom loss function(eleminate vale clipping) 让我们比较keras
binary_crossentropy
,tensorflow tf.nn.sigmoid_cross_entropy_with_logits
和自定义丢失函数( tf.nn.sigmoid_cross_entropy_with_logits
vale clipping)
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense
import keras
# keras
model = Sequential()
model.add(Dense(units=1, activation='sigmoid', input_shape=(
1,), weights=[np.ones((1, 1)), np.zeros(1)]))
# print(model.get_weights())
model.compile(loss='binary_crossentropy',
optimizer='adam', metrics=['accuracy'])
# tensorflow
G = tf.Graph()
with G.as_default():
x_holder = tf.placeholder(dtype=tf.float32, shape=(2,))
y_holder = tf.placeholder(dtype=tf.float32, shape=(2,))
entropy = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=x_holder, labels=y_holder))
sess = tf.Session(graph=G)
# keras with custom loss function
def customLoss(target, output):
# if not from_logits:
# # transform back to logits
# _epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
# output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
# output = tf.log(output / (1 - output))
output = tf.log(output / (1 - output))
return tf.nn.sigmoid_cross_entropy_with_logits(labels=target,
logits=output)
model_m = Sequential()
model_m.add(Dense(units=1, activation='sigmoid', input_shape=(
1,), weights=[np.ones((1, 1)), np.zeros(1)]))
# print(model.get_weights())
model_m.compile(loss=customLoss,
optimizer='adam', metrics=['accuracy'])
N = 100
xaxis = np.linspace(10, 20, N)
keras_ce = np.zeros(N)
tf_ce = np.zeros(N)
my_ce = np.zeros(N)
keras_custom = np.zeros(N)
y = np.array([0, 1])
for i, a in enumerate(xaxis):
x = np.array([-a, a])
# cross-entropy computed by keras/tensorflow
keras_ce[i] = model.evaluate(x, y)[0]
my_ce[i] = np.log(1+np.exp(-a)) # My own computation
tf_ce[i] = sess.run(entropy, feed_dict={x_holder: x, y_holder: y})
keras_custom[i] = model_m.evaluate(x, y)[0]
# print(model.get_weights())
plt.plot(xaxis, keras_ce, label='keras')
plt.plot(xaxis, my_ce, 'b', label='my_ce')
plt.plot(xaxis, tf_ce, 'r:', linewidth=5, label='tensorflow')
plt.plot(xaxis, keras_custom, '--', label='custom loss')
plt.xlabel('a')
plt.ylabel('xentropy')
plt.yscale('log')
plt.legend()
plt.savefig('compare.jpg')
plt.show()
we can see that tensorflow is same with manual computing, but keras with custom loss encounter numeric overflow as expected. 我们可以看到张量流与手动计算相同,但是具有自定义丢失的keras会遇到预期的数值溢出。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.