[英]how to parallelize the c-program for multi CPU
The result often times looks wrong, because the 'bmmin' after the parallelization seems to be wrong or something like that..结果通常看起来是错误的,因为并行化后的 'bmmin' 似乎是错误的或类似的东西..
#pragma omp parallel private(thread_id, bmmin, r ,t, am, b, bm)
{
thread_id=omp_get_thread_num();
bmmin=INFINITY;
for (i=0; i<nel; i++) {
am=a[i]+ldigitse*j;
b=roundl((lval-am)/ldigits0);
bm=fabsl(am+b*ldigits0-lval);
if (bm<bmmin) {
bmmin=bm;
t[0]=(int)b;
r=ldigits[0]*t[0];
for (l=1; l<ndig; l++) {
t[l]=(*s)[i][l-1];
r=r+ldigits[l]*t[l];
};
t[ndig]=j;
r=r+ldigits[ndig]*t[ndig];
};
};
// bmmin result looks almost same in many threads, why?
printf("Thread %d: r=%Lg, bmmin=%Lg, bmmin_glob=%Lg\n",thread_id,powl(10,r),bmmin,bmmin_glob);
#pragma omp critical
if (bmmin<bmmin_glob) {
printf("Thread %d - giving minimum r=%9Lg!\n",thread_id,powl(10,r));
bmmin_glob=bmmin;
r_glob=r;
for (i=0; i<=ndig; i++) {
t_glob[i]=t[i];
};
};
};
When running the code, it outputs as:运行代码时,输出如下:
Initializing the table of the logarithmic constants...
Calculation started for k from 0 to 38...
j,k=-19,0
Thread 7: r=2.57008e+30, bmmin=2.96034e-05, bmmin_glob=inf
Thread 7 - giving minimum r=2.57008e+30!
Thread 1: r=3.74482e+16, bmmin=2.96034e-05, bmmin_glob=inf
Thread 6: r=3.74482e+16, bmmin=2.96034e-05, bmmin_glob=inf
Thread 3: r=3.1399, bmmin=0.000234018, bmmin_glob=inf
Thread 2: r=3.74482e+16, bmmin=2.96034e-05, bmmin_glob=inf
Thread 5: r=3.1399, bmmin=0.000234018, bmmin_glob=inf
Thread 4: r=392.801, bmmin=0.000113243, bmmin_glob=inf
Thread 0: r=3.14138, bmmin=2.96034e-05, bmmin_glob=2.96034e-05
Result: 2.57008e+30
Exponents: 2^129*3^-13*5^16*7^-19
j,k=-18,1
with a lot of case that have bmmin=2.96034e-05, even the r-value has a lot of variation.在很多情况下 bmmin=2.96034e-05,甚至 r 值也有很大的变化。
bmmin
result looks almost same in many threads, why?bmmin
结果在许多线程中看起来几乎相同,为什么?
This is because it is defined as a private
variable in the parallel section in the code.这是因为它在代码的并行部分被定义为
private
变量。 In fact the same thing applies for thread_id
and other variables like r
.事实上,同样的事情适用于
thread_id
和其他变量,如r
。 A private variable is a variable defined and accessible only from each thread.私有变量是一个定义的变量,只能从每个线程访问。 If you want to make accessible the result of each thread to the main thread, then you need to store the value in an array.
如果你想让主线程访问每个线程的结果,那么你需要将值存储在一个数组中。 Alternatively you can use OpenMP reductions .
或者,您可以使用OpenMP 缩减。
[...] looks like the 'i' values are out of the range of the for loop
[...] 看起来 'i' 值超出了 for 循环的范围
Variable are implicitly shared by default in parallel sections.默认情况下,变量在并行部分中隐式共享。 This means
i
is shared by default.这意味着
i
默认是共享的。 Thus, there is a race condition on i
.因此,
i
上存在竞争条件。 You need to put it private or to declare it inside the parallel section so each thread have its own version.您需要将其设为私有或在并行部分中声明它,以便每个线程都有自己的版本。
Note that omp parallel
section does not share the work between threads.请注意,
omp parallel
部分不共享线程之间的工作。 You need to either use a parallel for
or to do it yourself (eg. splitting nel
so each thread compute a part of the loop if this is what you want).您需要使用
parallel for
或自己完成(例如,拆分nel
以便每个线程计算循环的一部分,如果这是您想要的)。
Besides this, #pragma omp critical
does nothing outside a parallel section.除此之外,
#pragma omp critical
在并行部分之外什么也不做。 It might be useful to use two directives: a #pragma omp for
directive to a #pragma omp parallel
and a #pragma omp for
ones.使用两个指令可能会有用:
#pragma omp for
指令到#pragma omp parallel
和#pragma omp for
ones。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.