简体   繁体   English

不同浮点数的答案具有不同的处理数量

[英]Different floating point answers with different number of processes

I am new to learning MPI and I coded up the following simple program to perform integration with Trapezoidal rule using Open MPI on Ubuntu 10.10. 我是学习MPI的新手,我编写了以下简单程序,以在Ubuntu 10.10上使用Open MPI与梯形规则进行集成。 Here is the code: 这是代码:

#include <iostream>
#include <mpi.h>
#include <cstdlib>

//function to integrate
double f (double x )
{
  return 4.0/(1+x*x);
}


//function which integrates the function defined above on the interval local_a and local_b for a given refinement parameters
double Trap(double local_a , double local_b, int local_n , double h)
{

  double integral ;
  double x;

  integral = ( f(local_a) + f(local_b) )/2.0;

  x = local_a ;

  for (int i = 1; i < local_n - 1; ++i)
    {
      x        += h; 
      integral += f(x);
    }

  integral *= h;
  return integral;
}

int main(int argc, char *argv[])
{

  int my_rank;
  int p;
  double a = 0.0;
  double b = 1.0;
  int n = atoi(argv[1]);//number of subdivisions of the interval
  double h;
  double local_a;
  double local_b;
  int local_n;

  double integral;
  double total;
  int source;
  int dest = 0;
  int tag = 0;
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD,&p);//get number pf processes
  MPI_Comm_rank(MPI_COMM_WORLD,&my_rank);//get rank

  double start , finish;
  MPI_Barrier(MPI_COMM_WORLD);
  start = MPI_Wtime();  

////////////////////////////////////////////////////////////////////////////////////////////////////
  h = (b-a)/n;
  local_n = n/p;

  local_a = a + my_rank*local_n*h;
  local_b = local_a + local_n*h;

  integral = Trap(local_a , local_b , local_n , h);

  if (my_rank==0)
    {
      total = integral;

     for (source = 1; source < p; ++source)
       {
     MPI_Recv(&integral, 1, MPI_DOUBLE , source , tag , MPI_COMM_WORLD, &status );
         total+= integral;

       }
     }

  else
    {
      MPI_Send(&integral, 1, MPI_DOUBLE, dest, tag , MPI_COMM_WORLD);
    }

  if (my_rank == 0)
    {
      printf("With n=%d trapezoids our estimate \n", n );
      printf("Of the integral from %f to %f = %f \n" , a ,b , total);

    }

   ////////////////////////////////////////////////////////////////////////////////////////////////////
  MPI_Barrier(MPI_COMM_WORLD);
  finish = MPI_Wtime();

  if(my_rank == 0)  std::cout << "Time taken is " << finish - start << std::endl ; 

  MPI_Finalize();
  return 0;
}

The function being integrated is f(x) = 4.0 / 1+x^2 which when integrated on [0,1] gives pi = 3.14159... 积分的函数为f(x) = 4.0 / 1+x^2 ,当积分为[0,1]pi = 3.14159...

Now when I ran the program with different number of processes I get different answers. 现在,当我以不同数量的进程运行程序时,我得到了不同的答案。 And the difference is quite significant as you can see below. 正如您在下面看到的,区别非常明显。

Desktop: mpirun -np 1 ./a.out 50000
With n=50000 trapezoids our estimate 
Of the integral from 0.000000 to 1.000000 = 3.141553 
Time taken is 0.000718832
Desktop: 
Desktop: 
Desktop: mpirun -np 2 ./a.out 50000
With n=50000 trapezoids our estimate 
Of the integral from 0.000000 to 1.000000 = 3.141489 
Time taken is 0.000422001
Desktop: 
Desktop: 
Desktop: 
Desktop: mpirun -np 3 ./a.out 50000
With n=50000 trapezoids our estimate 
Of the integral from 0.000000 to 1.000000 = 3.141345 
Time taken is 0.000365019
Desktop: 
Desktop: 
Desktop: 
Desktop: mpirun -np 4 ./a.out 50000
With n=50000 trapezoids our estimate 
Of the integral from 0.000000 to 1.000000 = 3.141362 
Time taken is 0.0395319

You've got 2 different problems in your code: 您的代码中有2个不同的问题:

1. The integration bounds depend on the number of MPI processes, and is wrong when p does not divide n . 1.积分界限取决于MPI进程的数量,并且当p不除n时是错误的。 Namely, the upper bound of the last process is 即,最后一个过程的上限是

a + p * int(n/p) * (b-a)/n

which is different from b . b不同。 I expect this to be the most important error in your code (except if there's another bug I haven't seen) 我希望这是您代码中最重要的错误(除非我没有看到另一个错误)

2. Floating-point operations are neither associative nor commutative. 2.浮点运算既不是关联的也不是可交换的。 The result of your parallel algorithm, which is aggregated from partial sums, will thus depend on the number of partial sums. 因此,并行算法的结果是根据部分和得出的,将取决于部分和的数量。

When doing floating-point arithmetic the order of operations matters. 在进行浮点运算时,运算顺序很重要。 In real arithmetic (I mean the arithmetic of real numbers) a+b+c+d==a+c+d+b (and any other ordering of the additions). 在实数算术中(我指实数的算术), a+b+c+d==a+c+d+b (以及加法的其他任何顺序)。 This is not necessarily true for floating-point arithmetic. 对于浮点运算,不一定是正确的。 Since MPI doesn't guarantee to do reductions from M processors to 1 processor in the same order every time it's floating-point behaviour is non-deterministic, at least as far as most of us need be concerned. 由于MPI不能保证每次浮点行为都是不确定性时,都无法将M处理器降级为1个处理器的顺序相同,至少就我们大多数人而言。

Putting that to one side, the differences between the results on the varying number of processors does look rather larger than I would expect. 一方面,在不同数量的处理器上结果之间的差异确实比我预期的要大。 Looking at your code I think that you use integer arithmetic in this line: 查看您的代码,我认为您在此行中使用整数算术:

local_n = n/p;

which leads to small parts of the total area not being assigned to any of the processes for calculation. 这导致总面积的一小部分没有分配给任何计算过程。 By my reckoning the line 据我估计

local_b = local_a + local_n*h;

does not set local_b to 1.0 for the last process. 不会在最后一个过程中将local_b设置为1.0。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM