When evaluating with IEEE 754 floating point numbers a and b, what is the worst case error in terms of the magnitude of a and b of the sum (a - b) + b? How close to a can I expect that to be?
100%. b
may be so large that ab
produces -b
, and then (ab)+b
produces zero.
For example, with IEEE-754 basic 64-bit binary, (1−2 54 )+2 54 yields 0, with round-to-nearest-ties-to-even. We can also have 100% in the other direction. If a
is 1 and b
is 2 53 +2, then (ab)+b
produces 2.
Also, if b
is infinity, (ab)+b
produces a NaN.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.