為什么OpenMP的性能優於線程？

Question

我一直在OpenMP中稱呼它

#pragma omp parallel for num_threads(totalThreads)
for(unsigned i=0; i<totalThreads; i++)
{
workOnTheseEdges(startIndex[i], endIndex[i]);
}

而這在C ++ 11 std :: threads中（我相信那些只是pthreads）

vector<thread> threads;
for(unsigned i=0; i<totalThreads; i++)
{
threads.push_back(thread(workOnTheseEdges,startIndex[i], endIndex[i])); 
}
for (auto& thread : threads)
{
 thread.join();
}

但是，OpenMP實現的速度是2倍-更快！ 我本來希望C ++ 11線程更快，因為它們更底層。 注意：上面的代碼不僅被調用一次，而且可能被循環調用10,000次，所以這可能與它有關嗎？

編輯：為澄清起見，實際上，我使用的是OpenMP或C ++ 11版本-不能同時使用。 當我使用OpenMP代碼時，需要45秒，而當我使用C ++ 11時，則需要100秒。

Answer 1

在您的OpenMP版本中totalThreads來自何處？ 我敢打賭這不是startIndex.size() 。

OpenMP版本將請求排隊到totalThreads工作線程上。 看起來好像C ++ 11版本創建了startIndex.size()線程，如果數量startIndex.size() ，則涉及大量的開銷。

Answer 2

考慮下面的代碼。 OpenMP版本在0秒內運行，而C ++ 11版本在50秒內運行。 這不是由於函數為doNothing，也不是由於vector在循環內。 可以想象，在每次迭代中都會創建並銷毀c ++ 11線程。 另一方面，OpenMP實際上實現了線程池。 它不在標准中，但在英特爾和AMD的實現中。

for(int j=1; j<100000; ++j)
{
    if(algorithmToRun == 1)
    {
        vector<thread> threads;
        for(int i=0; i<16; i++)
        {
            threads.push_back(thread(doNothing));
        }
        for(auto& thread : threads) thread.join();
    }
    else if(algorithmToRun == 2)
    {
        #pragma omp parallel for num_threads(16)
        for(unsigned i=0; i<16; i++)
        {
            doNothing();
        }
    }
}

為什么OpenMP的性能優於線程？

問題描述

2 個解決方案

解決方案1
3 2014-04-23 21:51:29

解決方案2
2 已采納 2014-04-25 05:46:59

為什么OpenMP的性能優於線程？

問題描述

2 個解決方案

解決方案1 3 2014-04-23 21:51:29

解決方案2 2 已采納 2014-04-25 05:46:59

解決方案1
3 2014-04-23 21:51:29

解決方案2
2 已采納 2014-04-25 05:46:59