如何并行化多CPU的c程序

Question

The result often times looks wrong, because the 'bmmin' after the parallelization seems to be wrong or something like that..结果通常看起来是错误的，因为并行化后的 'bmmin' 似乎是错误的或类似的东西..

    #pragma omp parallel private(thread_id, bmmin, r ,t, am, b, bm)
    {
    thread_id=omp_get_thread_num();
    bmmin=INFINITY;
    for (i=0; i<nel; i++) {
      am=a[i]+ldigitse*j;
      b=roundl((lval-am)/ldigits0);
      bm=fabsl(am+b*ldigits0-lval);
      if (bm<bmmin) {
        bmmin=bm;
        t[0]=(int)b;
        r=ldigits[0]*t[0];
        for (l=1; l<ndig; l++) {
          t[l]=(*s)[i][l-1];
          r=r+ldigits[l]*t[l];
        };
        t[ndig]=j;
        r=r+ldigits[ndig]*t[ndig];
      };
    };
    // bmmin result looks almost same in many threads, why?
    printf("Thread %d: r=%Lg, bmmin=%Lg, bmmin_glob=%Lg\n",thread_id,powl(10,r),bmmin,bmmin_glob);
    #pragma omp critical
    if (bmmin<bmmin_glob) {
      printf("Thread %d - giving minimum r=%9Lg!\n",thread_id,powl(10,r));
      bmmin_glob=bmmin;
      r_glob=r;
      for (i=0; i<=ndig; i++) {
        t_glob[i]=t[i];
      };
    };
    };

When running the code, it outputs as:运行代码时，输出如下：

Initializing the table of the logarithmic constants...
Calculation started for k from 0 to 38...
j,k=-19,0
Thread 7: r=2.57008e+30, bmmin=2.96034e-05, bmmin_glob=inf
Thread 7 - giving minimum r=2.57008e+30!
Thread 1: r=3.74482e+16, bmmin=2.96034e-05, bmmin_glob=inf
Thread 6: r=3.74482e+16, bmmin=2.96034e-05, bmmin_glob=inf
Thread 3: r=3.1399, bmmin=0.000234018, bmmin_glob=inf
Thread 2: r=3.74482e+16, bmmin=2.96034e-05, bmmin_glob=inf
Thread 5: r=3.1399, bmmin=0.000234018, bmmin_glob=inf
Thread 4: r=392.801, bmmin=0.000113243, bmmin_glob=inf
Thread 0: r=3.14138, bmmin=2.96034e-05, bmmin_glob=2.96034e-05
Result:    2.57008e+30
Exponents: 2^129*3^-13*5^16*7^-19
j,k=-18,1

with a lot of case that have bmmin=2.96034e-05, even the r-value has a lot of variation.在很多情况下 bmmin=2.96034e-05，甚至 r 值也有很大的变化。

Answer 1

bmmin result looks almost same in many threads, why? bmmin结果在许多线程中看起来几乎相同，为什么？

This is because it is defined as a private variable in the parallel section in the code.这是因为它在代码的并行部分被定义为private变量。 In fact the same thing applies for thread_id and other variables like r .事实上，同样的事情适用于thread_id和其他变量，如r 。 A private variable is a variable defined and accessible only from each thread.私有变量是一个定义的变量，只能从每个线程访问。 If you want to make accessible the result of each thread to the main thread, then you need to store the value in an array.如果你想让主线程访问每个线程的结果，那么你需要将值存储在一个数组中。 Alternatively you can use OpenMP reductions .或者，您可以使用OpenMP 缩减。

[...] looks like the 'i' values are out of the range of the for loop [...] 看起来 'i' 值超出了 for 循环的范围

Variable are implicitly shared by default in parallel sections.默认情况下，变量在并行部分中隐式共享。 This means i is shared by default.这意味着i默认是共享的。 Thus, there is a race condition on i .因此， i上存在竞争条件。 You need to put it private or to declare it inside the parallel section so each thread have its own version.您需要将其设为私有或在并行部分中声明它，以便每个线程都有自己的版本。

Note that omp parallel section does not share the work between threads.请注意， omp parallel部分不共享线程之间的工作。 You need to either use a parallel for or to do it yourself (eg. splitting nel so each thread compute a part of the loop if this is what you want).您需要使用parallel for或自己完成（例如，拆分nel以便每个线程计算循环的一部分，如果这是您想要的）。

Besides this, #pragma omp critical does nothing outside a parallel section.除此之外， #pragma omp critical在并行部分之外什么也不做。 It might be useful to use two directives: a #pragma omp for directive to a #pragma omp parallel and a #pragma omp for ones.使用两个指令可能会有用： #pragma omp for指令到#pragma omp parallel和#pragma omp for ones。

如何并行化多CPU的c程序

问题描述

1 个解决方案

解决方案1
0 2022-03-31 17:36:50

如何并行化多CPU的c程序

问题描述

1 个解决方案

解决方案1 0 2022-03-31 17:36:50

解决方案1
0 2022-03-31 17:36:50