[英]C++ OpenMP computation errors with private and shared clause
I have a for
loop to be parallelized with OpenMP, but there are multiple computational errors, probably due to my lack of understanding of the concept of multithreading with OpenMP: 我有一个与OpenMP并行化的
for
循环,但是有多个计算错误,可能是由于我对OpenMP多线程概念的了解不足:
for ( int i = -X/2; i < X/2; ++i )
{
base.y = anchor + i*rho_step;
temp = some_function( base );
if( temp > response )
{
buffer.y = base.y;
response = temp;
}
}
This works fine, then I made the following changes: 这工作正常,然后进行了以下更改:
#pragma omp parallel for shared (buffer, response) private(base, temp)
for ( int i = -X/2; i < X/2; ++i )
{
base.y = anchor + i*rho_step;
temp = some_function( base );
if( temp > response )
{
buffer.y = base.y;
response = temp;
}
}
In this code, neither buffer.y
nor response
will have the correct values. 在此代码中,
buffer.y
和response
都不具有正确的值。 In my understanding, every single thread should have an own copy of base.y
and temp
, they are only temporary variables for the computation, and buffer
and response
must be shared (they will store the computed data), but this does not work as I would expect. 以我的理解,每个线程都应该有一个自己的
base.y
和temp
副本,它们只是用于计算的临时变量,并且必须共享buffer
和response
(它们将存储计算出的数据),但这不能作为我希望。
The only version that is perfect is the following, but obviously, there is no performance increase: 唯一理想的版本是以下版本,但显然并没有提高性能:
omp_lock_t writelock;
omp_init_lock(&writelock);
omp_set_num_threads (4);
#pragma omp parallel for
for ( int i = -X/2; i < X/2; ++i )
{
omp_set_lock(&writelock);
base.y = anchor + i*rho_step;
temp = some_function( base );
if( temp > response )
{
buffer.y = base.y;
response = temp;
}
omp_unset_lock(&writelock);
}
omp_destroy_lock(&writelock);
What can be the problem? 可能是什么问题? (
anchor
and rho_step
are constants in this loop) (
anchor
和rho_step
在此循环中是常量)
In order to get your code to deal with the trans-thread of the buffer
and response
variables, you'll need to use some per-thread local variables for them, and perform a final reduction with them to update their shared counterparts. 为了使您的代码能够处理
buffer
和response
变量的跨线程,您需要为它们使用一些每个线程的局部变量,并对它们进行最后的归约以更新其共享的对应变量。
Here is what it would look like (not tested): 这是它的外观(未经测试):
#pragma omp parallel firstprivate( base )
{
auto localResponse = response;
auto localBuffer = buffer;
#pragma omp for
for ( int i = -X/2; i < X/2; ++i )
{
base.y = anchor + i * rho_step;
auto temp = some_function( base );
if ( temp > localResponse )
{
localBuffer.y = base.y;
localResponse = temp;
}
}
#pragma omp critical
{
if ( localResponse > response )
{
buffer.y = localBuffer.y;
response = localResponse;
}
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.