来自 sklearn 的 log_loss 给出了 nan，而 tensorflow.losses.log_loss 有效

Question

I have a binary classification problem.我有一个二元分类问题。 I am using the log_loss from tensorflow.losses.log_loss .我正在使用 tensorflow.losses.log_loss 中的tensorflow.losses.log_loss 。

To check, I use sklearn.metrics.log_loss .为了检查，我使用sklearn.metrics.log_loss 。 Most of the times, the two functions give the same result (only difference in dtype).大多数情况下，这两个函数给出相同的结果（仅 dtype 不同）。 In some instance, the sklearn function returns NaN while tf.losses.log_loss returns a correct value.在某些情况下， sklearn函数返回NaN而tf.losses.log_loss返回正确的值。

data is here: https://pastebin.com/BvDgDnVT数据在这里： https : //pastebin.com/BvDgDnVT

code:代码：

import sklearn.metrics
import tensorflow as tf
y_true = [... see pastebin link]
y_pred = [... see pastebin link]
loss_sk = sklearn.metrics.log_loss(y_true, y_pred, labels=[0, 1]) # -> returns NaN
with tf.Session() as sess:
    loss_tf = tf.losses.log_loss(y_true, y_pred).eval(session=sess) # -> returns 0.0549

There seems to be some log(0) happening, but why does tensorflow not have this problem?似乎有一些log(0)发生，但为什么 tensorflow 没有这个问题？

Answer 1

Changing the dtype of both arrays to a 64-bit float fixes it将两个数组的 dtype 更改为 64 位浮点数可修复它

dtype=np.float64

for example adding y_pred = y_pred.astype(np.float64)例如添加y_pred = y_pred.astype(np.float64)

Answer 2

Another way of fixing the issue is to provide eps=1e-7 to log_loss , which is a more appropriate epsilon for float32 and is what tensorflow's using.解决此问题的另一种方法是向log_loss提供eps=1e-7 ，这是更适合float32 epsilon，也是 tensorflow 使用的。 Scikit however uses 1e-15 as a default (expecting float64 ).然而，Scikit 使用1e-15作为默认值（期望float64 ）。

来自 sklearn 的 log_loss 给出了 nan，而 tensorflow.losses.log_loss 有效

问题描述

2 个解决方案

解决方案1
5 已采纳 2018-05-03 21:50:03

解决方案2
2 2020-02-20 10:43:48

来自 sklearn 的 log_loss 给出了 nan，而 tensorflow.losses.log_loss 有效

问题描述

2 个解决方案

解决方案1 5 已采纳 2018-05-03 21:50:03

解决方案2 2 2020-02-20 10:43:48

解决方案1
5 已采纳 2018-05-03 21:50:03

解决方案2
2 2020-02-20 10:43:48