[英]log_loss from sklearn gives nan, while tensorflow.losses.log_loss works
I have a binary classification problem.我有一个二元分类问题。 I am using the log_loss from
tensorflow.losses.log_loss
.我正在使用 tensorflow.losses.log_loss 中的
tensorflow.losses.log_loss
。
To check, I use sklearn.metrics.log_loss
.为了检查,我使用
sklearn.metrics.log_loss
。 Most of the times, the two functions give the same result (only difference in dtype).大多数情况下,这两个函数给出相同的结果(仅 dtype 不同)。 In some instance, the
sklearn
function returns NaN
while tf.losses.log_loss
returns a correct value.在某些情况下,
sklearn
函数返回NaN
而tf.losses.log_loss
返回正确的值。
data is here: https://pastebin.com/BvDgDnVT数据在这里: https : //pastebin.com/BvDgDnVT
code:代码:
import sklearn.metrics
import tensorflow as tf
y_true = [... see pastebin link]
y_pred = [... see pastebin link]
loss_sk = sklearn.metrics.log_loss(y_true, y_pred, labels=[0, 1]) # -> returns NaN
with tf.Session() as sess:
loss_tf = tf.losses.log_loss(y_true, y_pred).eval(session=sess) # -> returns 0.0549
There seems to be some log(0)
happening, but why does tensorflow not have this problem?似乎有一些
log(0)
发生,但为什么 tensorflow 没有这个问题?
Changing the dtype of both arrays to a 64-bit float fixes it将两个数组的 dtype 更改为 64 位浮点数可修复它
dtype=np.float64
for example adding y_pred = y_pred.astype(np.float64)
例如添加
y_pred = y_pred.astype(np.float64)
Another way of fixing the issue is to provide eps=1e-7
to log_loss
, which is a more appropriate epsilon for float32
and is what tensorflow's using.解决此问题的另一种方法是向
log_loss
提供eps=1e-7
,这是更适合float32
epsilon,也是 tensorflow 使用的。 Scikit however uses 1e-15
as a default (expecting float64
).然而,Scikit 使用
1e-15
作为默认值(期望float64
)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.