OPENMP 串行版本的代码比并行版本快我该如何解决

Question

here's a program that does some calculation and i am trying to use threads to make it run faster but i cannot make it run faster than the serial version.这是一个进行一些计算的程序，我正在尝试使用线程使其运行得更快，但我不能让它比串行版本运行得更快。

serial output is序列号 output 是

Function: y(x) = sin(x) [note that x is in radians]
Limits of integration: x1 = 7.000000, x2 = 2.000000 
Riemann sum with 5000 steps, a.k.a. dx=-0.001000 
Computed integral: 1.170175 
Exact integral:    1.170049 
Percent error:     0.010774 % 
Work took 0.182533 milliseconds

parallel output is并行 output 是

Function: y(x) = sin(x) [note that x is in radians]
Limits of integration: x1 = 7.000000, x2 = 2.000000 
Riemann sum with 5000 steps, a.k.a. dx=-0.001000 
Computed integral: 1.170175 
Exact integral:    1.170049 
Percent error:     0.010774 % 
Work took 0.667334 milliseconds

Here's the code这是代码

#include <stddef.h>  // for size_t
#include <stdio.h>
#include <stdlib.h>     /* atoi, atof */
#include <math.h>
#include "omp.h" // just used for timing

int main(int argc, char *argv[]) {
   double start, end;
   start = omp_get_wtime(); // Start our work timer

   // BEGIN TIMED CODE BLOCK
   double x, y;
   double x1, x2; // Limits of integration
   double dx;
   double ysum, integral;
   size_t i;
   size_t nsteps;

   // Read in command line arguments
   x1 = atof(argv[1]); // lower x limit
   x2 = atof(argv[2]); // upper x limit
   nsteps = atof(argv[3]); // number of steps in Riemann sum
  omp_set_num_threads(2); 
   // Compute delta x
      dx = (x2 - x1)/nsteps; // delta x for the Riemann sum

        // Perform numeric integration via Riemann sum
    ysum = 0;
        // Temporary variable to hold the sum prior to multiplication by dx
    #pragma omp parallel shared(ysum) private(x,y)
    {
        #pragma omp for 
        for (i=0; i<nsteps; i++) {
                x = x1 + i*dx; // x value at this step
                y = sin(x); // y(x) at this step; note that x is always in radians
            #pragma omp critical
            ysum += y; // summation of y(x)
        }               
    #pragma omp critical
    integral = ysum * dx; // Our computed integral: the summation of y(x)*dx
       // END TIMED CODE BLOCK
    }


   end = omp_get_wtime(); // Stop our work timer

   double analytic = -cos(x2) + cos(x1); // The known, exact answer to this integration problem

   printf("Function: y(x) = sin(x) [note that x is in radians]\n");
   printf("Limits of integration: x1 = %lf, x2 = %lf \n", x1, x2);
   printf("Riemann sum with %ld steps, a.k.a. dx=%lf \n", nsteps, dx); 
   printf("Computed integral: %lf \n", integral);
   printf("Exact integral:    %lf \n", analytic);
   printf("Percent error:     %lf %% \n", fabs((integral - analytic) / analytic)*100);
   printf("Work took %f milliseconds\n", 1000 * (end - start));
   return 0;
}

the output changes when i removed the critical sections so i assume i did the right thing there当我删除关键部分时，output 发生了变化，所以我认为我在那里做了正确的事情

Answer 1

Every time you have #pragma omp critical you impose a barrier to effective multithreading.每次使用#pragma omp critical时，都会对有效的多线程设置障碍。 You can use the #pragma omp parallel for directive, along with a reduction clause, to parallelize your loop.您可以使用#pragma omp parallel for指令和reduction子句来并行化您的循环。

#pragma omp parallel for reduction(+:ysum)
for (int i = 0; i < nsteps; ++i) {
    auto x = x1 + i * dx;
    auto y = sin(x);
    ysum += y;
}

integral = ysum * dx;

The temporary variables that are used within the loop are declared there, so that each thread will have its own copy of them (the loop body can be rewritten to not need x or y ).循环中使用的临时变量在此处声明，因此每个线程都有自己的副本（循环体可以重写为不需要x或y ）。 The reduce clause will (in this instance) keep a separate ysum value with each thread, then at the end add all those values together. reduce子句将（在这种情况下）为每个线程保留一个单独的ysum值，然后在最后将所有这些值加在一起。

OPENMP 串行版本的代码比并行版本快我该如何解决

问题描述

1 个解决方案

解决方案1
2 2020-04-25 02:27:37

OPENMP 串行版本的代码比并行版本快 我该如何解决

问题描述

1 个解决方案

解决方案1 2 2020-04-25 02:27:37

OPENMP 串行版本的代码比并行版本快我该如何解决

解决方案1
2 2020-04-25 02:27:37