简体   繁体   English

IEEE浮点异常 - 为什么?

[英]IEEE floating point exception - why?

I'm making a simple function to reproduce the Secant method , and I'm having some issues with precision, I think. 我正在制作一个简单的函数来重现Secant方法 ,我认为我有一些精确的问题。

First, here's the function (as well as a main method to call it, and a test function to use with it): 首先,这是函数(以及调用它的main方法,以及与它一起使用的测试函数):

double secant_method(double(*f)(double), double a, double b){
    double c;
    for (int i = 0; i < 10; i++){
        c = a - f(a) * (a - b) / (f(a) - f(b));
        b = a; a = c;
    }
    return c;
}

typedef double (*func)(double);

//test function - x^3 + 4x - 10
double test(double x){
    return (x*x*x) + (4*x) - 10;
}

int main(){
    func f = &test;
    double ans;

    ans = secant_method(f, 0, 2);

    printf("\nRoot found:\t%.*g\n", DECIMAL_DIG * 2, ans);

    return 0;
}

Note : the for loop in the function secant_method() only loops 10 times. 注意 :函数secant_method()for循环只循环10次。 This is where my issue comes in. 这就是我的问题所在。

When this prints as-is, everything is ok. 当按原样打印时,一切正常。 It gives the correct output to 16 decimal places: 它将正确的输出提供给16位小数: 在此输入图像描述

However, when I add iterations to the for loop in secant_method() , this happens: 但是,当我在secant_method()for循环添加迭代时,会发生这种情况: 在此输入图像描述

Why does this happen? 为什么会这样? Have I reached the maximum representation that C can handle? 我是否达到了C可以处理的最大代表性?


I read through this great answer from another post, but in it, concerning the exception I'm receiving ( -1.#IND ), it just says that my result isn't a number, or I'm doing some kind of illegal operation. 我从另一篇文章中读到了这个伟大的答案 ,但在其中,关于我收到的异常( -1.#IND ),它只是说我的结果不是数字,或者我做某种非法操作。

EDIT: using (x*x*x) + (4*x) - 10 + sin(x) as my test function gives the correct answer - but only if I loop when i < 9 , instead of i < 10 for (x*x*x) + (4*x) - 10 编辑:使用(x*x*x) + (4*x) - 10 + sin(x)作为我的测试函数给出正确的答案 - 但只有当i < 9时循环,而不是i < 10 (x*x*x) + (4*x) - 10

-1.#IND is Microsoft's way of outputting an indeterminate value, specifically NaN . -1.#IND是微软输出不确定值的方式,特别是NaN

One of the ways this can happen is with 0 / 0 but I would check all operations to see where the issue lies: 其中一个会发生这种情况的一种方法是用0 / 0但我会检查所有的操作,看看那里的问题所在:

double secant_method(double(*f)(double), double a, double b){
    double c;
    printf("DBG =====\n");
    for (int i = 0; i < 10; i++){
        printf("\nDBG -----\n");
        printf("DBG i: %d\n",i);
        printf("DBG a: %30f\n",a);
        printf("DBG b: %30f\n",b);
        printf("DBG c: %30f\n",c);
        printf("DBG f(a): %30f\n",f(a));
        printf("DBG a-b: %30f\n",a-b);
        printf("DBG f(b): %30f\n",f(b));
        printf("DBG f(a)-f(b): %30f\n",f(a)-f(b));
        printf("DBG f(a)*(a-b): %30f\n",f(a)*(a-b));
        printf("DBG f(a)*(a-b)/(f(a)-f(b)): %30f\n",f(a)*(a-b)/(f(a)-f(b)));

        c = a - f(a) * (a - b) / (f(a) - f(b));
        b = a; a = c;
    }
    return c;
}

Once you have that debug output, then you can figure out what the actual issue is, and adopt strategies to avoid it. 一旦你有了调试输出, 那么你就可以弄清楚实际问题是什么,并采取策略来避免它。


When I do that, I see (at the end): 当我这样做时,我看到(最后):

DBG -----
DBG i: 8
DBG a: 1.556773264394211375716281509085
DBG b: 1.556773264393484179635152031551
DBG c: 1.556773264394211375716281509085
DBG f(a): -0.000000000000000987057657830803
DBG a-b: 0.000000000000727196081129477534
DBG f(b): -0.000000000008196943991622962500
DBG f(a)-f(b): 0.000000000008195956933965131697
DBG f(a)*(a-b): -0.000000000000000000000000000718
DBG f(a)*(a-b)/(f(a)-f(b)): -0.000000000000000087577871187781

DBG -----
DBG i: 9
DBG a: 1.556773264394211375716281509085
DBG b: 1.556773264394211375716281509085
DBG c: 1.556773264394211375716281509085
DBG f(a): -0.000000000000000987057657830803
DBG a-b: 0.000000000000000000000000000000
DBG f(b): -0.000000000000000987057657830803
DBG f(a)-f(b): 0.000000000000000000000000000000
DBG f(a)*(a-b): -0.000000000000000000000000000000
DBG f(a)*(a-b)/(f(a)-f(b)): nan

Root found:     nan

So you can see, on the tenth iteration, a and b have become equal and hence so have f(a) and f(b) . 所以你可以看到,在第十次迭代中, ab变得相等,因此f(a)f(b)也是如此。 So you're getting the expression: 所以你得到的表达方式是:

something * 0 / 0

which, as mentioned, will give you 0 / 0 or NaN . 如上所述,它会给你0 / 0NaN


In terms of how to fix it, you just need to avoid dividing by zero since that will give you eithere NaN or an infinity. 就如何修复它而言,你只需要避免除以零,因为这会给你带来NaN或无穷大。 So you could use the following function instead: 所以你可以改用以下函数:

double secant_method(double(*f)(double), double a, double b){
    double c;
    for (int i = 0; i < 1000; i++) {
        if (f(a) == f(b)) break;
        c = a - f(a) * (a - b) / (f(a) - f(b));
        b = a; a = c;
    }
    return c;
}

A thousand loops should be more than enough to get a decent answer and it will opt out early if you're ever about to divide by zero. 一千个循环应该足以得到一个合适的答案,如果你要分零,它会提前选择退出。


If you want more precision, you could either look into the long double type or switch to using one of the arbitrary precision arithmetic libraries such as GMP or MPIR. 如果你想要更高的精度,你可以查看long double类型或切换到使用任意精度算术库之一,如GMP或MPIR。

That's usually more work but you can achieve some impressive results. 这通常是更多的工作,但你可以取得一些令人印象深刻的结果。 This program, built on MPIR: 这个程序建立在MPIR上:

#include <stdio.h>
#include <mpir.h>

void secant_method(mpf_t result, void(*f)(mpf_t, mpf_t), mpf_t a, mpf_t b){
    mpf_t c, fa, fb, temp1, temp2;

    mpf_init (fa);
    mpf_init (fb);
    mpf_init (temp1);
    mpf_init (temp2);

    for (int i = 0; i < 1000; i++){
        printf("DBG i: %d\n",i);

        f (fa, a);
        f (fb, b);
        if (mpf_cmp (fa, fb) == 0) break;

        mpf_set (temp1, a);
        mpf_sub (temp1, temp1, b);

        mpf_set (temp2, fa);
        mpf_sub (temp2, temp2, fb);

        mpf_set (result, fa);
        mpf_mul (result, result, temp1);
        mpf_div (result, result, temp2);
        mpf_sub (result, result, a);
        mpf_neg (result, result);

        mpf_set (b, a);
        mpf_set (a, result);
    }
}

void test (mpf_t result, mpf_t x){
    mpf_t temp;

    mpf_set (result, x);
    mpf_pow_ui (result, result, 3);

    mpf_init_set (temp, x);
    mpf_mul_ui (temp, temp, 4);

    mpf_add (result, result, temp);

    mpf_set_ui (temp, 10);
    mpf_sub (result, result, temp);

    mpf_clear (temp);
}

int main(){
    mpf_t ans, a, b;

    mpf_set_default_prec (8000);

    mpf_init_set_ui (ans, 0);
    mpf_init_set_ui (a, 0);
    mpf_init_set_ui (b, 2);

    secant_method (ans, &test, a, b);

    mpf_out_str (stdout, 10, 0, ans);

    return 0;
}

outputs much more precision, about two and a half thousand digits: 输出更精确,大约两千五百个数字:

DBG i: 1
:
DBG i: 19
0.155677326439421146326886324730853302634853266143
22856485101283627988036767055520913212330822780959
93349183787687346999781239000417393618333668026011
02048595843228945228507966189601958673920851932189
20626590635658264390975889008832048255537650792123
54916373054888140164770654992918100928227714960414
65208113116379497717707745267800989233875981344305
90022883167106124203999713536673991376957068731244
91919087980169395013246250812213656324598765244218
15974098310512802880727074335472786858740154287363
31949470951650710072488856623955478366217474755111
76368234254761541647442609230138418167182918204711
66713459423756284737546964906061587903876515793884
14091165347411853670752820576131460960421137744435
73729141652832258144582021037373967987171478026002
48487515446248979731517957120705447608265161099693
33098235693813752370774508652788986557620510981156
19907950657355934071535840759135251701581523712307
00051674680667972152582339710574822560693109306285
91240827697915787078746087225027856691436076089912
35551789799825731841345891629028445554314717823386
07885164744100235567602875364878328805811271289098
87558119684442289199181352023304600847178256323082
57317198584882656089836229208443415369358460418542
84083408696290686178971039756668669303212658278679
39542421457300944206839268283788585029652481323614
65995074020560963212330914882733926627309382310653
39023265929195094492468196461296569155421718696631
73798097369621805062145075113127308161572398104766
37356504104570136778437926442139603916930640425421
15655156674699552536588332891562053247342008145504
44336211031437923307615880759201695011419324719812
46482293928341901673056596202744639074280785106031
90197472588293352508389295101867514582271001202777
85575614897203080940643669476500979934666490279524
88486176409290187337498631681392563044899541391612
88438904336237873504970887963071622208868799638373
42186338496601471274609131141920820263780493617795
89714798662834913192777810386631915415021934333441
01797098172897161215116673422762953435902633516501
73788202968876596925999628999004575114529754782488
59959395407324243559011982543407738505315960009874
36510513519775603567237051670918870105777288994910
85524037720122749091827520695838000086150188462000
63190624219373460624686216781527327604063990319908
56547016812115842640285111265677758613385414834511
69237199199725030839166586376374587900611430229333
87296847315023767826706323911923435564643861604120
017381909481e1

And, if you take that number and pass it back into the test() function, you get a number rather close to zero, about -1.15 x 10 -2408 . 并且,如果您将该数字传递回test()函数,则会得到一个非常接近于零的数字,大约为-1.15 x 10 -2408 So I would say that's a pretty big improvement on using double . 所以我想说使用double是一个相当大的改进。

And, for what it's worth, it only takes about a tenth of a second CPU time so it's at least feasible to do this with arbitrary precision arithmetic. 并且,对于它的价值而言,它只需要大约十分之一秒的CPU时间,因此使用任意精度算法进行此操作至少是可行的。


For even more precision, just change the default precision settings for MPIR, currently set as: 要获得更高的精度,只需更改MPIR的默认精度设置,当前设置为:

mpf_set_default_prec (8000);

Bumping that up to 100,000 gives an answer with over 30,000 significant digits, and a final "close-to-zero" answer of about -5 x 10 -30103 . 最高可达100,000的答案可以得到超过30,000个有效数字的答案,最终的“接近于零”的答案约为-5 x 10 -30103

The #IND is caused by a division of zero by zero. #IND由零除零引起。 Insert a simple isnan(c) check into the loop while debugging, and you'll discover that a and b eventually become equal, which leads to both a - b and f(a) - f(b) being zero. 在调试时将一个简单的isnan(c)检查插入到循环中,你会发现ab最终变得相等,这导致a - bf(a) - f(b)都为零。

I believe casting is an issue. 我相信铸造是一个问题。 All your constant numbers are not interpreted by the compiler as double , but rather regular integers. 所有常数都不会被编译器解释为double ,而是常规整数。 Add a "." 添加“。” (or ".0") at the end of each constant to make them double. (或“.0”)在每个常数的末尾使它们加倍。 test() should hence be changed to: 因此,test()应改为:

return (x*x*x) + (4.*x) - 10.;

And you must change to the following in main: 您必须在main中更改为以下内容:

ans = secant_method(f, 0., 2.);

I tested this on Windows with DECIMAL_DIG defined as 9. I got #INF without the dots, and 5e-315 with the dots using 11 iterations. 我在Windows上测试了这个,DECIMAL_DIG定义为9.我得到#INF没有点,5e-315带点使用11次迭代。

printf("\nRoot found:\t%.*g\n", DECIMAL_DIG * 2., ans);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM