Why does the log loss function return undefined when the prediction has 1 or 0?

Question

Below is the R code I used to calculate log loss:

LogLoss <- function(pred, res){
    (-1/length(pred)) * sum (res * log(pred) + (1-res)*log(1-pred)) }

However, if the prediction list contains 0 or 1, the corresponding log loss would return Infinity or NaN respectively.

LogLoss(c(0.9,0,0.2), c(1,1,1))

[1] NaN

LogLoss(c(0.9,1,0.2), c(1,1,1))

[1] Inf

I don't quite understand why this is the case, won't this make it impossible to calculate log loss if the result contains a 0 or 1?

Answer 1

The problem is we are dealing with logarithm of 0.

When we use logloss, if the prediction is 0 or 1, we usually use minmax rule to perturb it away from 0 and 1.

For example,

> pred = max(min(c(0.9,0,0.2), 1-10^-15), 10^-15)
> LogLoss(pred, c(1,1,1))
[1] 103.6163

Remark:

I experimented and obtained the result that is opposite as yours. log(0) = -Inf and 0*log(0) = NaN (0 times infinity is NaN)