简体   繁体   English

用OpenMP替换TBB parallel_for

[英]Replacing TBB parallel_for with OpenMP

I'm trying to come up with an equivalent replacement of an Intel TBB parallel_for loop that uses a tbb::blocked_range using OpenMP. 我试图提出一个等效的替代品,该替代品是使用OpenMP使用tbb::blocked_range的英特尔TBB parallel_for循环。 Digging around online, I've only managed to find mention of one other person doing something similar; 在网上闲逛时,我只发现有人提到有人在做类似的事情。 a patch submitted to the Open Cascade project, wherein the TBB loop appeared as so (but did not use a tbb::blocked_range): 提交给Open Cascade项目的补丁,其中TBB循环按原样显示(但未使用tbb :: blocked_range):

tbb::parallel_for_each (aFaces.begin(), aFaces.end(), *this);

and the OpenMP equivalent was: 和OpenMP等效的是:

int i, n = aFaces.size();
#pragma omp parallel for private(i)
for (i = 0; i < n; ++i)
    Process (aFaces[i]);

Here is TBB loop I'm trying to replace: 这是我要替换的TBB循环:

tbb::parallel_for( tbb::blocked_range<size_t>( 0, targetList.size() ), DoStuff( targetList, data, vec, ptr ) );

It uses the DoStuff class to carry out the work: 它使用DoStuff类执行工作:

class DoStuff
{
private:
    List& targetList;
    Data* data;
    vector<things>& vec;
    Worker* ptr;

public:
    DoIdentifyTargets( List& pass_targetList, 
                       Data* pass_data, 
                       vector<things>& pass_vec, 
                       Worker* pass_worker) 
        : targetList(pass_targetList), data(pass_data), vecs(pass_vec), ptr(pass_worker)
    {
    }

    void operator() ( const tbb::blocked_range<size_t> range ) const
    {
        for ( size_t idx = range.begin(); idx != range.end(); ++idx )
        {
            ptr->PerformWork(&targetList[idx], data->getData(), &Vec);
        }
    }
};

My understanding based on this reference is that TBB will divide the blocked range into smaller subsets and give each thread one of the ranges to loop through. 根据此参考资料 ,我的理解是,TBB将阻止的范围划分为较小的子集,并为每个线程提供一个范围以循环通过。 Since each thread will get its own DoStuff class, which has a bunch of references and pointers, meaning the threads are essentially sharing those resources. 由于每个线程将获得自己的DoStuff类,该类具有一堆引用和指针,因此这些线程实质上共享这些资源。

Here's what I've come up with as an equivalent replacement in OpenMP: 这是我在OpenMP中提出的等效替代品:

int index = 0;
#pragma omp parallel for private(index)
for (index = 0; index < targetList.size(); ++index)
{
    ptr->PerformWork(&targetList[index], data->getData(), &Vec);
}

Because of circumstances outside of my control (this is merely one component in a much larger system that spans +5 computers) stepping through the code with a debugger to see exactly what's happening is... Unlikely. 由于我无法控制的情况(这只是跨越+5台计算机的更大系统中的一个组件),使用调试器逐步检查代码以查看发生了什么……不太可能。 I'm working on getting remote debugging going, but it's not looking very promising. 我正在努力进行远程调试,但看起来前景并不乐观。 All I know for sure is that the above OpenMP code is somehow doing something differently than TBB was, and expected results after calling PerformWork for each index are not obtained. 我唯一可以确定的是,上述OpenMP代码在某种程度上与TBB有所不同,并且在为每个索引调用PerformWork之后都无法获得预期的结果。

Given the information above, does anyone have any ideas on why the OpenMP and TBB code are not functionally equivalent? 鉴于以上信息,是否有人对为什么OpenMP和TBB代码在功能上不等效有任何想法?

Following Ben and Rick's advice, I tested the following loop without the omp pragma (serially) and obtained my expected results (very slowly). 遵循Ben和Rick的建议,我(串行地)在没有omp编译指示的情况下测试了以下循环,并获得了预期的结果(非常慢)。 After adding the pragma back in, the parallel code also performs as expected. 重新添加编译指示后,并行代码也将按预期执行。 Looks like the problem was either in declaring the index as private outside of the loop, or declaring numTargets as private inside the loop. 看起来问题是在循环外部将索引声明为私有的,还是在循环内部将numTargets声明为私有的。 Or both. 或两者。

    int numTargets = targetList.size();
    #pragma omp parallel for
    for (int index = 0; index < numTargets; ++index)
    {
        ptr->PerformWork(&targetList[index], data->getData(), &vec);
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM