简体   繁体   English

OpenMP num_threads(1)的执行速度比没有OpenMP快

[英]OpenMP num_threads(1) executes faster than no OpenMP

I've run my code in a variety of circumstances which has resulted in what I believe to be odd behavior. 我已经在各种情况下运行我的代码,导致我认为这是奇怪的行为。 My testing was on a dual core intel xeon processor with HT. 我的测试是在具有HT的双核Intel Xeon处理器上进行的。

No OpenMP '#pragma' statement, total runtime = 507 seconds 没有OpenMP'#pragma'语句,总运行时间= 507秒

With OpenMP '#pragma' statement specifying 1 core, total runtime = 117 seconds 使用OpenMP'#pragma'语句指定1个内核,总运行时间= 117秒

With OpenMP '#pragma' statement specifying 2 core, total runtime = 150 seconds 使用OpenMP'#pragma'语句指定2个内核,总运行时间= 150秒

With OpenMP '#pragma' statement specifying 3 core, total runtime = 157 seconds 使用OpenMP'#pragma'语句指定3个内核,总运行时间= 157秒

With OpenMP '#pragma' statement specifying 4 core, total runtime = 144 seconds 使用OpenMP'#pragma'语句指定4个内核,总运行时间= 144秒

I guess I can't figure out why commenting out my openmp line makes the program slow down so much between 1 thread without openmp and 1 thread WITH openmp. 我想我无法弄清楚为什么注释掉我的openmp行会导致程序在没有openmp的1个线程和使用openmp的1个线程之间速度如此之慢。

All I am changing is between: 我要更改的只是以下两者之间:

//#pragma omp parallel for shared(segs) private(i, j, p_hough) num_threads(1) schedule(guided)

and...

#pragma omp parallel for shared(segs) private(i, j, p_hough) num_threads(1,2,3,4) schedule(guided)

Anyways, if anyone has any idea why this may be happening, please let me know! 无论如何,如果有人知道为什么会发生这种情况,请告诉我!

Thanks for any help, 谢谢你的帮助,

Brett 布雷特

EDIT: I'll address some of the comments here 编辑:我将在这里解决一些评论

I am using num_threads(1), num_threads(2), etc.. 我正在使用num_threads(1),num_threads(2)等。

With further investigation, it turns out that my results are inconsistent based upon whether or not the "schedule(guided)" line is included in the code. 经过进一步的调查,结果表明,根据代码中是否包含“计划(引导)”行,我的结果不一致。

-When I'm utilizing the schedule(guided) line, I generate the fastest solution, regardless of the number of threads. -当我使用schedule(guided)行时,无论线程数量如何,我都会生成最快的解决方案。 -When I'm using the default scheduler, my results are significantly slower and different values -With schedule(guided) improvement is not gained with increased threads -Without the schedule(guided) I gain improvement with addition of threads -当我使用默认的调度程序时,我的结果会明显变慢并且取不同的值-在增加线程的情况下,进度表(指导)的改进不会得到改善-在没有进度表(指导的情况下),添加线程的改进会得到改进

I guess I haven't found a good enough description of what schedule(guided) does for me, I do understand that it tries to split up the loop so that the most time intensive iterations happen first, which should have an effect of the least amount of time that one thread waits for the others to complete their iterations. 我想我还没有找到足够的时间表(指导)对我的功能的描述,我确实知道它会尝试拆分循环,以便最耗时的迭代首先发生,而效果应该最小一个线程等待其他线程完成其迭代的时间。

It appears that for my ~900 iteration loop, when I use schedule(guided), I'm only processing ~200 iterations, where as without the schedule(guided) I'm processing all 900 iterations. 看来,对于我的约900次迭代循环,当我使用schedule(指导)时,我仅处理约200次迭代,而没有使用schedule(指导)时,我将处理所有900次迭代。 Any thoughts? 有什么想法吗?

OpenMP has significant synchronization overheads. OpenMP具有显着的同步开销。 I have found that unless you have a really big loop that does a lot of work, and has no intra-loop synchronization, then it is generally not worthwhile using OpenMP. 我发现,除非你有一个非常环路,做了很多工作,并具有无环路内同步,那么它通常是不值得使用OpenMP。

I think that when you set the number of threads to one (1), OpenMP simply does a procedure call to the OpenMP procedure implementing the loop, so the overhead is minimal, and performance is essentially identical to the non-OpenMP case. 我认为将线程数设置为一(1)时,OpenMP只是对实现循环的OpenMP过程进行了过程调用,因此开销很小,并且性能与非OpenMP情况基本相同。

Otherwise, I think OpenMP sets some semaphores, and waiting "worker" threads wake up, synchronize their access to the data structures telling them what loop parameters to set, and then call the routine that does the work, and when they complete the chunk of work, they signal the master thread again. 否则,我认为OpenMP会设置一些信号量,然后等待“工作程序”线程醒来,同步它们对数据结构的访问,告诉他们要设置哪些循环参数,然后调用完成工作的例程以及何时完成这些工作。工作时,它们再次向主线程发出信号。 This synchronization must happen for each chunk of work that a thread does, and the synchronization costs are non-trivial. 这种同步必须在线程执行的每个工作块中发生,并且同步开销是不小的。

Using the STATIC scheduling option can help reduce the scheduling/synchronization overheads, particularly if the number of loop iterations is large relative to the number of cores. 使用STATIC调度选项可以帮助减少调度/同步开销,尤其是在循环迭代的次数相对于内核数较大的情况下。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM