简体   繁体   中英

How to compute total floating-point rounding error of a set of arithmetic computations in java using Math.ulp(double)?

I want to calculate the floating-point rounding error of a series of additions, multiplications, and divisions using the Math.ulp(double) method in Java. According to the wiki page on Unit in the Last place (ULP), it seems that the error from one floating-point calculation, say 2+3 or 2*3 would be 0.5*ulp(2+3) or 0.5*ulp(2*3), respectively, where 2*3 and 2+3 are the floating-point calculations. However, adding up these errors does not account for the actual error I get in the end product. Saying the maximum error, for example, of 2+3*4 = 0.5*ulp(2+[3*4]) + 0.5*ulp(3*4) does not seem to account for the actual error I get. Thus, I am confused, and perhaps I am misunderstanding Math.ulp(double) or maybe I need to use some kind of relative error. I don't know. Could anyone explain this to me and maybe give a few examples of addition, multiplication, and division with floating-point and exact numbers? It would be much appreciated.

I'm attempting to calculate the reduced row echelon form of a matrix for a Matrix class, and I need to know if, after a few calculations, certain items in the two dimensional array I'm using for the calculation are equal to 0. If a row is all zeroes, I exit the code. If it has a nonzero number in it, I divide that number by itself and then perform Gaussian elimination. The problem is that after performing a series of operations, floating-point error can creep in and calculations that should result in a zero end up as a nonzero number which then messes up my matrix calculation. Thus, I am trying to change the condition under which Gaussian elimination occurs from zero to less than a calculated error bound, and I am calculating the error bound for every item in the matrix based on the calculations done to that item, added together in a new error array. Here is my code:

/**
 * Finds the reduced row echelon form of the matrix using partial pivoting
 * @return rref: The reduced row echelon form of the matrix
 */
public Matrix rref()
{
    //ref()
    Matrix ref = copy();
    int iPivot = 0, jPivot = 0, greatestPivotRow;
    double[][] errorArray = new double[height][width];
    while(iPivot < height && jPivot < width)
    {
        do
        {
            //Finds row with greatest absolute-value-of-a-number at the horizontal value of the pivot position
            greatestPivotRow = iPivot;
            for(int n = iPivot; n < height; n++)
            {
                if(Math.abs(ref.getVal(n, jPivot)) > Math.abs(ref.getVal(greatestPivotRow, jPivot)))
                    greatestPivotRow = n;
            }
            //Swaps row at pivot with that row if that number is not 0 (Or less than the floating-point error)
            //If the largest number is 0, all numbers below in the column are 0, so jPivot increments and row swapper is repeated
            if(Math.abs(ref.getVal(greatestPivotRow, jPivot)) > errorArray[greatestPivotRow][jPivot])
                ref = ref.swapRows(iPivot, greatestPivotRow);
            else
                jPivot++;
        }
        while(jPivot < width && Math.abs(ref.getVal(greatestPivotRow, jPivot)) <= errorArray[greatestPivotRow][jPivot]); 
        if(jPivot < width)
        {
            //Pivot value becomes 1
            double rowMultiplier1 = 1/ref.getVal(iPivot,jPivot);
            for(int j = jPivot; j < width; j++)
            {
                ref.matrixArray[iPivot][j] = ref.getVal(iPivot,j) * rowMultiplier1;
                errorArray[iPivot][j] += 0.5 * (Math.ulp(ref.matrixArray[iPivot][j]) + Math.ulp(rowMultiplier1));
            }
            //1st value in nth row becomes 0
            for(int iTarget = iPivot + 1; iTarget < height; iTarget++)
            {
                double rowMultiplier0 = -ref.getVal(iTarget, jPivot)/ref.getVal(iPivot, jPivot);
                for(int j = jPivot; j < width; j++)
                {
                    errorArray[iTarget][j] += 0.5 * (Math.ulp(ref.getVal(iPivot, j) * rowMultiplier0) + Math.ulp(ref.getVal(iTarget, j)
                            + ref.getVal(iPivot, j)*rowMultiplier0) + Math.ulp(rowMultiplier0));
                    ref.matrixArray[iTarget][j] = ref.getVal(iTarget, j)
                            + ref.getVal(iPivot, j)*rowMultiplier0;
                }
            }
        }
        //Shifts pivot down 1 and to the right 1
        iPivot++;
        jPivot++;
    }

    //rref
    Matrix rref = ref.copy();
    iPivot = 1;
    jPivot = 1;
    //Moves pivot along the diagonal
    while(iPivot < height && jPivot < width)
    {
        //Moves horizontal position of pivot to first nonzero number in the row (the 1)
        int m = jPivot;
        while(m < width && Math.abs(rref.getVal(iPivot, m)) < errorArray[iPivot][m])
            m++;
        if(m != width)
        {
            jPivot = m;
            //1st value in rows above pivot become 0
            for(int iTarget = 0; iTarget < iPivot; iTarget++)
            {
                double rowMultiplier = -rref.getVal(iTarget, jPivot)/rref.getVal(iPivot, jPivot);
                for(int j = jPivot; j < width; j++)
                {
                    errorArray[iTarget][j] += 0.5 * (Math.ulp(rref.getVal(iTarget, j) * rowMultiplier) + Math.ulp(rref.getVal(iTarget, j)
                            + rref.getVal(iPivot, j)*rowMultiplier) + Math.ulp(rowMultiplier));
                    rref.matrixArray[iTarget][j] = rref.getVal(iTarget, j)
                            + rref.getVal(iPivot, j)*rowMultiplier;
                }
            }
        }
        iPivot++;
        jPivot++;
    }
    //Get rid of floating-point errors in integers
    for(int i = 0; i < height; i++)
    {
        for(int j =0; j < width; j++)
        {
            if(Math.abs(rref.getVal(i, j) - (int)(rref.getVal(i, j) + 0.5)) <= errorArray[i][j])
                rref.matrixArray[i][j] = (int)(rref.getVal(i, j) + 0.5);
        }
    }
    return rref;
}

The last part of the code, converting floating-point numbers less than the calculated error away from an integer value to that integer value is mostly to tell me if my error formula has worked, since some of the matrices I am calculating end up with, instead of integers, 5.000000000000004s and the like. Therefore, I know if I have a number very close to an integer but not the integer, I also know that my error bounds are not large enough, and apparently they are not, so I think I am doing something incorrectly.

My input matrix was one with the instance variable

double[][] matrixArray = {{1,-2,0,0,3}, {2,-5,-3,-2,6}, {0,5,15,10,0}, {2,6,18,8,6}};

And my result was the array

[[1.0, 0.0, 0.0, -2.0000000000000013, 3.0], [0.0, 1.0, 0.0, -1.0000000000000004, 0.0], [0.0, 0.0, 1.0, 1.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0]]

Though my error calculations fixed the problem with zeroes being turned into ones and then used in Gaussian elimination, I still have numbers that are not integers, so I know my error bounds are inaccurate. It may have worked in this case, but might not in the next without the correct error bounds.

2+3*4 = 0.5*ulp(2+[3*4]) + 0.5*ulp(3*4)

Errors compound. Like interest, the final error can grow exponentially. The operations in your example are exact, so it's difficult to see what you are complaining about (surely you did get exactly 14?). Are you taking into account the representation error that cause the constants involved in your computations not to be the mathematical values but also a 0.5ULP approximation of them?

In addition to the exponential growth of error when computed statically with the necessary precision, there is the problem that you are using inaccurate floating-point math to compute errors:

errorArray[iTarget][j] += 0.5 * (Math.ulp(rref.getVal(iTarget, j) * rowMultiplier) + Math.ulp(rref.getVal(iTarget, j)

The actual error can be more than computed by this statement because nothing prevents the floating-point addition from being a lower approximation of the mathematical result (the multiplications happen to be likely exact because one of the multiplicand is a power of two in each case).

In another programming language, you could change the rounding mode to “upwards” for this computation, but Java does not provide access to this functionality.


Here are a bunch of tangentially relevant remarks:

When the mathematically expected result is an integer, the usual way to obtain a double that is that integer is to ensure a 1ULP error for the entire computation. You almost never get a 1ULP bound for a computation that involves more than a couple of operations, unless you take special steps to ensure this bound (for instance Dekker multiplication ).

Java can use constants and print results in the hexadecimal format , and you should use that if you want to see what is happening exactly.

If you are interested in obtaining an upper bound on the final error along a specific computation, as opposed to statically for all computations, then interval arithmetic is slightly more accurate than characterization of error as a single absolute value, and requires much less thinking. In a context where you know by other means that the result has to be an integer, if the resulting interval contains only one integer, you would know for sure that this is the only possible answer.

If you are interested in calculating error bounds for Gaussian elimination process, that's a very complicated issue. For example, this paper gives a formula on upper bound of error: Higham NJ, Higham DJ. Large Growth Factors in Gaussian Elimination with Pivoting. SIAM Journal on Matrix Analysis and Applications. 1989;10(2):155.

The formula is: 在此处输入图片说明

Which is by no means simple!

On the other hand, if your goal is to prevent creeping floating point errors from ruining your zeros, I don't think you even need to create errorArray[][] . You will do fine by calculating in floating point and then setting a condition for precision with the help of Math.ulp() or machine epsilon. In this way, you won't need a final loop to "get rid" of those pesky zeros in the end!

You can also use java's BigDecimal and see if you get better results. Maybe this question and its given answers can help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM