简体   繁体   中英

-Infinity values in Java

I'm trying to sum 561 logs.
They look like these:

-7.314254939475686
-7.656004233197743
-4.816276208120333
-8.426112454893817
-4.771824445549499
-9.34240318676797  

So they're not big numbers. However, when I proceed with summing them I get this:

-2668.179647264475
-2674.7747795369874
-2679.18920466334
-2683.9724816026214
-2690.3342661536453
-Infinity
-Infinity  

The code that does it is:

double probspam=0;

for(int j=0;j<words.size();j++)
{
    probspam+= Math.log(spam.getClassProbability(words.get(j)));

}  

Do you have any idea of how to get around the -Infinity issue and why it happens? Thank you

For some values, spam.getClassProbability() returns 0.0 : see the docs :

If the argument is positive zero or negative zero, then the result is negative infinity.

The Javadoc for Math explains why you get -Infinity as a result:

If the argument is positive zero or negative zero, then the result is negative infinity.

You should check your values for zeros, or filter them out prior to applying the log function.

Most likely the value of spam.getClassProbability(words.get(j)) is zero at some point.

Math.log(0.0) returns negative infinity (as the API documentation says).

One of your spam candidates is getting a zero from getClassProbability :

System.out.println(Math.log(0));

Output:

-Infinity

This is a special reserved double value, and any operation on it also gives -Infinity , so once it hits the zero, your summing variable will stay -Infinity

To "fix" it, do this:

double wordProbSpam = spam.getClassProbability(words.get(j));
probspam += wordProbSpam > 0 ? Math.log(wordProbSpam) : 0;

Frankly, I think your approach is flawed. I would be simply summing the result of getClassProbability(), not summing its log , because for number between 0-1 the log is negative, which will do weird things to the sum.

如果一个单词的类别概率为零,则将-Infinity加到和上。

I think you've already had this questioned in general - you are taking the log of 0.0. Even if your getClassProbability() is perfect, numerical underflow may still mean it returns zero when mathematically speaking the result was non-zero.

One option is to replace all zeros with the value of Double.ulp(0.0). This is the smallest non-zero value Java can represent (4.9e-324) and has a log around -744.44. This recognises the game breaking concept of a zero probability. After all spammers are very clever so the probability will never truly be zero.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM