简体   繁体   English

具有特定线程的 OpenMP for 循环

[英]OpenMP for loop with specific threads

I'm new in parallel programming.我是并行编程的新手。 I'm trying to make point cloud processing process parallel.我正在尝试使点云处理过程并行。 I share with my program structure below.我在下面分享我的程序结构。 Firstly, I separate the point cloud into the partial clouds.首先,我将点云分成部分云。 My aim is that every thread must call the fillFrustumCloud() function separately.我的目标是每个线程都必须分别调用 fillFrustumCloud() function。

int num_threads = 12;

std::vector<CloudColored::Ptr> vector_colored_projected_clouds(num_threads);
std::vector<Cloud::Ptr> vector_projected_clouds(num_threads);

omp_set_num_threads(num_threads);

// private( ) shared()
#pragma omp parallel  shared(vector_colored_projected_clouds,vector_projected_clouds)
{
    
    for(int i=0; i<num_threads; i++)
    {

        #pragma omp critical
        {
            std::cout << "Thread id: " << omp_get_thread_num() << " loop id: " << i <<  std::endl;
        }

        const unsigned int  start_index = cloud_in->size()/num_threads*i;
        const unsigned int  end_index = cloud_in->size()/num_threads*(i+1);

        Cloud::Ptr partial_cloud(new Cloud);

        if(i==num_threads-1)
        {
            partial_cloud->points.assign(cloud_in->points.begin()+start_index, cloud_in->points.end());
        }else{
            partial_cloud->points.assign(cloud_in->points.begin()+start_index, cloud_in->points.begin()+end_index);
        }

            LidcamHelpers::fillFrustumCloud(partial_cloud, mat_point_transformer, img_size, vector_colored_projected_clouds,
                                            vector_projected_clouds, i, interested_detections, id, reshaped_img);
    }
}

but output is:但 output 是:

Thread id: 0 loop id: 0
Thread id: 1 loop id: 0
Thread id: 2 loop id: 0
Thread id: 3 loop id: 0
Thread id: 0 loop id: 1
Thread id: 1 loop id: 1
Thread id: 2 loop id: 1
Thread id: 3 loop id: 1
Thread id: 0 loop id: 2
Thread id: 3 loop id: 2
Thread id: 2 loop id: 2
Thread id: 1 loop id: 2
Thread id: 3 loop id: 3
Thread id: 1 loop id: 3
Thread id: 2 loop id: 3
Thread id: 0 loop id: 3

According to my aim, it should be like this:根据我的目标,它应该是这样的:

Thread id: 0 loop id: 0
Thread id: 1 loop id: 1
Thread id: 2 loop id: 2

Note that: I pass the vector_colored_projected_clouds and vector_projected_clouds into the function by reference in order to store the result.请注意:我通过引用将 vector_colored_projected_clouds 和 vector_projected_clouds 传递到 function 以存储结果。 I guess they should be shared variables.我想它们应该是共享变量。

This #pragma omp parallel constructor will create a parallel region, with as many threads as you have set it up.这个#pragma omp parallel构造函数将创建一个并行区域,其中包含您设置的尽可能多的线程。 Hence, when you do:因此,当您这样做时:

#pragma omp parallel
{
    for(int i=0; i<num_threads; i++)
    {
       ... 
    }
}

every thread in the parallel region will execute all the iterations of the loop.并行区域中的每个线程都将执行循环的所有迭代。 That is why you have 16 outputted lines ( ie, 4 threads x 4 loop iterations).这就是为什么你有 16 行输出(4 个线程 x 4 个循环迭代)。

If you want to distribute the iterations of a loop among threads you should use the #pragma omp for instead.如果要在线程之间分配循环的迭代,则应使用#pragma omp for代替。 So in your code you can either do:因此,在您的代码中,您可以执行以下操作:

#pragma omp parallel
{
    #pragma omp for
    for(int i=0; i<num_threads; i++)
    {
       ... 
    }
}

or或者

#pragma omp parallel for
for(int i=0; i<num_threads; i++)
{
   ... 
}

Since, you only want to distribute the iterations of the loop among threads, you can use the latter ( ie, #pragma omp parallel for ).由于您只想在线程之间分配循环的迭代,因此可以使用后者(#pragma omp parallel for )。

It looks as though you are using看起来好像您正在使用

#pragma omp critical
{
    std::cout << "Thread id: " << omp_get_thread_num() << " loop id: " << i <<  std::endl;
}

for debugging purposes.用于调试目的。 Bear in mind, however, that even with the critical region, the order in which threads will output is non-deterministic.但是请记住,即使使用critical区域,线程 output 的顺序也是不确定的。 If you rather that threads would output deterministically, use #pragma omp ordered instead of critical.如果您希望线程确定性地使用 output,请使用#pragma omp ordered而不是critical。 The ordered constructor will enforce that the chunk of code that it wraps around will be executed in the same order that would have been executed if the code was executed sequentially. ordered构造函数将强制它包裹的代码块将按照与顺序执行代码相同的顺序执行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM