简体   繁体   中英

Formulas for comparing float equality in PHP

The question of how to compare floating point numbers has been answered here . This question is different in that I'm asking about the formulas. The two highest voted answers have a slightly different solution to the problem:

if (abs(($a-$b)/$b) < $epsilon) { … }

and

if (abs($a-$b) < $epsilon) { … }

Why does the first answer contain the division? Doesn't it cause inaccurate results? For example (using easy numbers), let both $a and $b equal 0.01, and assume $a - $b results in 0.0001, with an $epsilon of 0.001.

((((0.01 - 0.01) == 0.0001) / 0.01 == 0.01) < 0.001) : false

whereas

(((0.01 - 0.01) == 0.0001) < 0.001) : true

My math may be a little rusty, but what am I missing?

When should I use the one formula over the other?

This probably allows checking epsilon with a relative error rather than an absolute error.

Compare these two cases:

function areEqual(float $a, float $b) : bool {
    return abs(($a - $b) / $b) < 0.00001;
}
areEqual(10000, 10000.01);
areEqual(0.0000001, 0);

Fact about the example values above: Our epsilon here is 0.00001 for convenience ‐ the smallest epsilon possible is much smaller than these values anyway, so let's ignore this fact. Our algorithm assumes that $a and $b are both similar, so it does not matter whether we divide by $a or $b . Actually, 10000 should be much larger than that (a very enormous exponent), and 0.0000001 can be much smaller, but for the sake of convenience, let's assume these are the values that may cause problems.

Now you can already see the difference.

For the large numbers: If the compared floats are extremely large, epsilon may be too small. The float internally can only store a definite number of digits for precision, while the exponent can be way greater than that. As a result, the source of floating point error, ie the final digits of floats, would appear at somewhere that can be higher than the unit digits. In other words, for extremely large floats, the absolute error can be greater than 1 , much less our epsilon of 0.00001 .

For the small numbers: This is even more obvious. Both numbers are smaller than the epsilon already. Even if you compare them with 0, while the relative error is infinitely large, you still think that they are equal. For this case, you either multiply up both operands, or you decrease the epsilon. They are actually the same, but in terms of implementation, it is more convenient to divide the difference with one of the operands, which will multiply up for small numbers ( / 0.0001 is equivalent to * 10000 ) or divide down for large numbers ( / 10000 while the difference is hopefully way smaller than 10000 )

There is another name for this check. While abs($a - $b) is called the absolute error, we usually use the relative error, which is absolute error ÷ approximate value. Since the values can be negative as well, we abs the whole thing ($a - $b) / $b instead. Our "epsilon", 0.00001 , in this case, means that our tolerate relative error is 0.00001 , ie 0.001% error.


Keep in mind that this is still not absolutely safe. After numerous transformations in your program, you may, for example, add/multiply your numbers with some big numbers, then subtract down again, leaving the impure error in the big numbers somewhere still negligible to humans but notable to your epsilon value. Therefore, always think twice before choosing an epsilon value or float comparison algorithm.

As the best practice, avoid adding, subtracting or multiplying big numbers with small numbers. They will increase the chance of errors. When developing (especially simplifying) your algorithms, always put into consideration that their might be an error in your floats. This may increase work load to a stupid extent, but as long as you are aware of it, this kind of worry sometimes saves you from getting kicked out of teams.

It is a question of accuracy versus precision. If you do not do the division, you are looking at precision, so you know your standard, and being good to the nth decimal place (or whatever you choose) is good enough. If you do the division, you are talking accuracy (think percent error), so being off an ounce is awful if you are selling a gold earring and fine if you are weighing yourself.

The division is there to take the fact into account that floating-point numbers have an exponent. This means that the smallest difference between two consecutive numbers gets bigger when the numbers themselves get bigger.

For example, 1e-300 and 2e-300 are two distinct numbers (for a IEE 754 64 bit type), but 1e300 and 1e300 + 1e-300 are the same.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM