简体   繁体   English

无法使用 OpenMP 锁生成直方图

[英]Cannot generate histogram with OpenMP locks

I am trying to learn how to use OpenMP locks, for which I am testing it on a program that generates a histogram, given some data.我正在尝试学习如何使用 OpenMP 锁,为此我正在一个生成直方图的程序上测试它,给定一些数据。

I generate some random numbers with the following:我使用以下内容生成一些随机数:

unsigned long seed = 9876543210 ;

std::mt19937_64 gen {seed} ;

float mu    = 5.0 ;
float sigma = 1.0 ;

std::normal_distribution<float> dist {mu, sigma} ;

int N = 5000 ;

float array[N] ;

for( int i=0; i<N; ++i ) array[i] = dist( gen ) ; // store random numbers

To generate histogram (within a range [min, max]) with OpenMP locks, I do the following:要使用 OpenMP 锁生成直方图(在 [min, max] 范围内),我执行以下操作:

int bins = 5  ;
int hist[bins] ;
int ival ;

for( ival=0; ival<bins; ++ival ) hist[ival] = 0 ;

omp_lock_t locks[bins] ;

for( ival=0; ival<bins; ++ival ) omp_init_lock(&locks[ival]) ;

float min   = mu - 3*sigma       ;
float max   = mu + 3*sigma       ;
float scale = (max - min) / bins ;

int i ;

#pragma omp parallel for num_threads( 4 )
for( i=0; i<N; ++i ) {

ival = (int) floorf( (array[i] - min) / scale ) ; // bin index

if( ival < 0 || ival >= bins ) continue ;

omp_set_lock(&locks[ival]) ;
hist[ival]++ ;
omp_unset_lock(&locks[ival]) ;

}

for( ival=0; ival<bins; ++ival ) omp_destroy_lock(&locks[ival]) ;

This program takes exceedingly long to run, so I had to quit it before it could finish.这个程序运行时间非常长,所以我不得不在它完成之前退出它。 The serial version takes an instant and runs just fine.串行版本需要瞬间并运行得很好。

What am I doing wrong here?我在这里做错了什么?

The compilation is done with g++ using the flags:编译是用 g++ 使用标志完成的:

-std=c++11 -O2 -fopenmp

Thanks in advance.提前致谢。

Your use of locks is technically correct in the sense that they do what you intend them to do.您对锁的使用在技术上是正确的,因为它们可以按照您的意愿进行操作。 However, locks are extremely slow when used in this manner.但是,以这种方式使用锁时速度非常慢。 Highly contested locks like these are always slow.像这样竞争激烈的锁总是很慢。 Even when not contested, they require at least one atomic instruction to lock and one to unlock.即使没有竞争,它们也需要至少一个原子指令来锁定和一个解锁。 Those run at a throughput of 1 every 20 cycles or so.那些以每 20 个周期左右 1 个的吞吐量运行。 Your normal histogram may run at 1 per cycle or one every few cycles (due to load-store-forwarding).您的正常直方图可能以每个周期 1 个或每几个周期一个(由于加载存储转发)的速度运行。

The solution is to use one partial histogram per thread.解决方案是每个线程使用一个部分直方图。 Then accumulate all histograms at the end.然后在最后累加所有直方图。 This is described in many answers on SO.这在 SO 的许多答案中都有描述。 See for example参见示例

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM