使用 OpenMP 执行并行代码比执行串行代码需要更多时间

Question

I'm trying to make this code to run in parallel.我正在尝试使此代码并行运行。 It's a chunk of code from a big project.这是来自一个大项目的一段代码。 I thought I started parallelizing slowly to see if there is a problem step by step (I don't know if that's a good tactic so please let me know).我以为我开始慢慢并行化，以逐步查看是否存在问题（我不知道这是否是一个好策略，所以请告诉我）。

double best_nearby(double delta[MAXVARS], double point[MAXVARS], double prevbest, int nvars)
{
    double z[MAXVARS];
    double minf, ftmp;
    int i;
    minf = prevbest;
    omp_set_num_threads(NUM_THREADS);
    
    #pragma omp parallel for shared(nvars,point,z) private(i)
    for (i = 0; i < nvars; i++)
        z[i] = point[i];
    for (i = 0; i < nvars; i++) {
        z[i] = point[i] + delta[i];
        ftmp = f(z, nvars);
        if (ftmp < minf)
            minf = ftmp;
        else {
            delta[i] = 0.0 - delta[i];
            z[i] = point[i] + delta[i];
            ftmp = f(z, nvars);
            if (ftmp < minf)
                minf = ftmp;
            else
                z[i] = point[i];
        }
    }
    for (i = 0; i < nvars; i++)
        point[i] = z[i];

    return (minf);
}

NUM_THREADS is #defined NUM_THREADS 是#defined

The function has some more lines but they are the same among the parallel and the serial. function 有更多的线，但它们在并行和串行之间是相同的。

It looks like the serial code takes on average 130s thus the parallel takes something like 400s.看起来串行代码平均需要 130 秒，因此并行代码需要大约 400 秒。 It baffles me that such a small change can lead up to so much increase in exe time.让我感到困惑的是，如此小的变化会导致 exe 时间的大幅增加。 Any ideas on why this happens?关于为什么会发生这种情况的任何想法？ Thank you in advance!先感谢您！

double f(double *x, int n){
double fv;
int i;

funevals++;
fv = 0.0;
for (i=0; i<n-1; i++)   /* rosenbrock */
    fv = fv + 100.0*pow((x[i+1]-x[i]*x[i]),2) + pow((x[i]-1.0),2);

return fv;
}

Answer 1

Currently, you are not parallelizing much.目前，您并没有太多并行化。 You can start by parallelizing the f function since it looks computational demanding:您可以从并行化f function 开始，因为它看起来对计算要求很高：

double f(double *x, int n){
..
  double fv = 0.0;

  #pragma omp parallel for reduction(+:fv)
  for (int i=0; i<n-1; i++)
       fv = fv + 100.0*pow((x[i+1]-x[i]*x[i]),2) + pow((x[i]-1.0),2);

   return fv;
}

Test and check the results.测试并检查结果。 After that you can try to expand the scope of the parallelization to include also the outermost loop.之后，您可以尝试扩展并行化的 scope 以包括最外层循环。

使用 OpenMP 执行并行代码比执行串行代码需要更多时间

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-05-20 19:21:18

使用 OpenMP 执行并行代码比执行串行代码需要更多时间

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-05-20 19:21:18

解决方案1
1 已采纳 2021-05-20 19:21:18