简体   繁体   中英

Why do I keep getting Nan as output?

I am trying to write a simple gradient descent algorithm in C++ (for 10,000 iterations). Here is my program:

#include<iostream>
#include<cmath>

using namespace std;

int main(){

  double learnrate=10; 
  double x=10.0; //initial start value

  for(int h=1; h<=10000; h++){
     x=x-learnrate*(2*x + 100*cos(100*x));
  }

  cout<<"The minimum is at y = "<<x*x + sin(100*x)<<" and at x = "<<x;

  return 0;
}

The output ends up being: y=nan and x=nan. I tried looking at the values of x and y by putting them into a file, and after a certain amount of iterations, I am getting all nans (for x and y). edit: I picked the learning rate (or step size) to be 10 as an experiment, I will use much smaller values afterwards.

There must be something wrong with your formula. Already the first 10 values of x are increasing like hell:

-752.379
15290.7
-290852
5.52555e+06
-1.04984e+08
1.9947e+09
-3.78994e+10
7.20088e+11
-1.36817e+13
2.59952e+14

No matter what starting value you choose the absolute value of the next x will be bigger.

|next_x| = | x - 20 * x - 100 * cos(100*x) |

For example consider what happens when you choose a very small starting value ( |x|->0 ), then

|next_x| = | 0 - 20 * 0 - 100 * cos ( 0 ) | = 100

Print x before the call to the cosine function and you will see that the last number printed before NaN (at h = 240 ) is:

-1.7761e+307

This means that the value is going to infinity, which cannot be represented (thus Not a Number).

It overflows the double type.

If you use long double , you will succeed in 1000 iterations, but you will still overflow the type with 10000 iterations.

So the problem is that the parameter learnrate is just too big. You should do let steps, while using a data type with larger range, as I suggested above.

Because at h=240 the variable "x" exceeds the limits of double type (1.79769e+308). This is a diverging arithmetic progression. You need to reduce your learn rate.

A couple of more things: 1- Do not use "using namespace std;" it is bad practice. 2- You can use "std::isnan() function to identify this situation.

Here is an example:

#include <iomanip>

#include <limits>
int main()
{


  double learnrate = 10.0;

  double x = 10.0; //initial start value

  std::cout<<"double type maximum=" << std::numeric_limits<double>::max()<<std::endl;
  bool failed = false;
  for (int h = 1; h <= 10000; h++)
  {

    x = x - learnrate*(2.0*x + 100.0 * std::cos(100.0 * x));
    if (std::isnan(x))
    {
      failed = true;
      std::cout << " Nan detected at h=" << h << std::endl;
      break;
    }
  }
  if(!failed)
  std::cout << "The minimum is at y = " << x*x + std::sin(100.0*x) << " and at x = " << x;

  return 0;
}

The "learn rate" is far too high. Change it to 1e-4, for example, and the program works, for an initial value of 10 at least. When the learnrate is 10, the iterations jump too far past the solution.

At its best, gradient descent is not a good algorithm. For serious applications you want to use something better. Much better. Search for Brent optimizer and BFGS.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM