简体繁体 English

为什么我的OpenMP实现比单线程实现慢？

[英]Why is my OpenMP implementation slower than a single threaded implementation?

原文 2011-02-18 14:17:54 0 3 c/ openmp

I am learning about OpenMP concurrency, and tried my hand at some existing code I have. 我正在学习OpenMP并发性，并尝试了我现有的一些代码。 In this code, I tried to make all the for loops parallel. 在这段代码中，我尝试将所有for循环并行化。 However, this seems to make the program MUCH slower, at least 10x slower, or even more than the single threaded version. 但是，这似乎使程序更慢，比单线程版本慢10倍甚至更多。

Here is the code: http://pastebin.com/zyLzuWU2 这是代码： http ： //pastebin.com/zyLzuWU2

I also used pthreads, which turns out to be faster than the single threaded version. 我也使用了pthreads，结果比单线程版本更快。

Now the question is, what am I doing wrong in my OpenMP implementation that is causing this slowdown? 现在的问题是，在我的OpenMP实现中我做错了什么导致了这种放缓？

Thanks! 谢谢！

edit: the single threaded version is just the one without all the #pragmas 编辑：单线程版本只是没有所有#pragmas的版本

3 个解决方案

One problem I see with your code is that you are using OpenMP across loops that are very small (8 or 64 iterations, for example). 我在您的代码中看到的一个问题是，您在非常小的循环（例如，8或64次迭代）中使用OpenMP。 This will not be efficient due to overheads. 由于开销，这将无效。 If you want to use OpenMP for the n-queens problem, look at OpenMP 3.0 tasks and thread parallelism for branch-and-bound problems. 如果要将OpenMP用于n-queens问题，请查看OpenMP 3.0任务和线程并行性以解决分支绑定问题。

I think your code is much too complex to be reviewed here. 我认为您的代码太复杂了，无法在此处进行审核。 One error that I saw immediately is that it is not even correct. 我立即看到的一个错误是它甚至不正确。 At places where you are using an omp parallel for to do sums you must use reduction(+: yourcountervariable) to have the results of the different threads correctly assembled together. 在使用omp parallel for执行求和的地方，必须使用reduction(+: yourcountervariable)将不同线程的结果正确组合在一起。 Otherwise one thread may overwrite the result of the others. 否则，一个线程可能会覆盖其他线程的结果。

At least two reasons: 至少有两个原因：

You're only doing 8 iterations of a very simple loop. 你只做了一个非常简单的循环的8次迭代。 Your runtime will be completely dominated by the overhead involved in setting up all the threads. 您的运行时将完全由设置所有线程所涉及的开销所主导。
In some places, the critical section will cause contention; 在某些地方， critical部分会引起争议; all the threads will be trying to access the critical section continuously, and block each other. 所有线程都将尝试连续访问临界区，并相互阻塞。