简体   繁体   中英

How to determine error in floating-point calculations?

I have the following equation I want to implement in floating-point arithmetic:

Equation: sqrt((ab)^2 + (cd)^2 + (ef)^2)

I am wondering how to determine how the width of the mantissa affects the accuracy of the results? How does this affect the accuracy of the result? I was wondering what the correct mathematical approach to determining this is?

For instance, if I perform the following operations, how will the accuracy be affected as after each step?

Here are the steps:

Step 1 , Perform the following calculations in 32-bit single precision floating point: x=(ab), y=(cd), z=(ef)

Step 2 , Round the three results to have a mantissa of 16 bits (not including the hidden bit),

Step 3 , Perform the following squaring operations: x2 = x^2, y2 = y^2, z2 = z^2

Step 4 , Round x2, y2, and z2 to a mantissa of 10 bits (after the decimal point).

Step 5 , Add the values: w = x2 + y2 = z2

Step 6 , Round the results to 16 bits

Step 7, Take the square root: sqrt(w)

Step 8 , Round to 20 mantissa bits (not including the mantissa).

There are various ways of representing the error of a floating point numbers. There is relative error (a * (1 + ε)), the subtly different ULP error (a + ulp(a) * ε), and relative error. Each of them can be used in analysing the error but all have shortcomings. To get sensible results you often have to take take into account what happens precisely inside floating point calculations. I'm afraid that the 'correct mathematical approach' is a lot of work, and instead I'll give you the following.

simplified ULP based analysis

The following analysis is quite crude, but it does give a good 'feel' for how much error you end up with. Just treat these as examples only.

(ab) The operation itself gives you up to a 0.5 ULP error (if rounding RNE). The rounding error of this operation can be small compared to the inputs, but if the inputs are very similar and already contain error, you could be left with nothing but noise!

(a^2) This operation multiplies not only the input, but also the input error. If dealing with relative error, that means at least multiplying errors by the other mantissa. Interestingly there is a little normalisation step in the multiplier, that means that the relative error is halved if the multiplication result crosses a power of two boundary. The worst case is where the inputs multiply just below that, eg having two inputs that are almost sqrt(2). In this case the input error is multiplied to 2*ε*sqrt(2). With an additional final rounding error of 0.5 ULP, the total is an error of ~2 ULP.

adding positive numbers The worst case here is just the input errors added together, plus another rounding error. We're now at 3*2+0.5 = 6.5 ULP.

sqrt The worst case for a sqrt is when the input is close to eg 1.0. The error roughly just get passed through, plus an additional rounding error. We're now at 7 ULP.

intermediate rounding steps It will take a bit more work to plug in your intermediate rounding steps. You can model these as an error related to the number of bits you're rounding off. Eg going from a 23 to a 10 bit mantissa with RNE introduces an additional 2^(13-2) ULP error relative to the 23-bit mantissa , or 0.5 ULP to the new mantissa (you'll have to scale down your other errors if you want to work with that).

I'll leave it to you to count the errors of your detailed example, but as the commenters noted, rounding to a 10-bit mantissa will dominate, and your final result will be accurate to roughly 8 mantissa bits.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM