简体   繁体   English

用openmp并行计算循环

[英]Parallelize counting for-loops with openmp

I have an 2d-image where I want to count all colors and store the result in an array. 我有一个2D图像,我想计算所有颜色并将结果存储在数组中。 I know the number of colors, so I can set the size of the array before. 我知道颜色的数量,所以我可以设置数组的大小。 My problem now is that the counting lasts too long for me. 我现在的问题是,计数对我来说太长了。 How can I speed the counting up with OpenMP? 如何使用OpenMP加速计数?
My current serial code is 我当前的序列号是

std::vector<int> ref_color_num_thread;
    ref_color_num.resize(ref_color.size());
    std::fill(ref_color_num.begin(), ref_color_num.end(), 0);
    ref_color_num_thread.resize(ref_color.size());
    std::fill(ref_color_num_thread.begin(), ref_color_num_thread.end(), 0);

    for (int i = 0; i < image.width(); i++)
    {
        for (int j = 0; j < image.height(); j++)
        {
            for (int k = 0; k < (int)ref_color.size(); k++)
            {
                if (image(i, j, 0, 0) == ref_color[k].R && image(i, j, 0, 1) == ref_color[k].G && image(i, j, 0, 2) == ref_color[k].B)
                    ref_color_num_thread[k]++;
            }
        }
    }

First approaches were setting #pragma omp parallel for at each loop (each try at another), but everytime I get a program crash because of wrong memory access. 最初的方法是在每个循环中#pragma omp parallel for设置#pragma omp parallel for (每次尝试另一个),但是每次由于错误的内存访问而使程序崩溃。 Do I have to use private() for my vector? 我必须对向量使用private()吗?

What you're doing is filling a histogram of your colors. 您正在做的是填充颜色的直方图。 This is equivalence to doing an array reduction in C/C++ with OpenMP. 这等效于使用OpenMP在C / C ++中进行数组缩减。 In C/C++ OpenMP does not have built in support for this (but it does in Fortran due to the fact that the array size is known in Fortran where in C/C++ it's only known for static arrays). 在C / C ++中,OpenMP没有内置对此的支持(但在Fortran中确实如此),因为事实是,数组大小在Fortran中是已知的,而在C / C ++中,它仅对于静态数组是已知的。 However, it's easy to do an array reduction in C/C++ with OpenMP yourself. 但是,使用OpenMP自己在C / C ++中进行数组缩减很容易。

#pragma omp parallel 
{
    std:vector<int> ref_color_num_thread_private(ref_color.size(),0);
    #pragma omp for
    for (int i = 0; i < image.width(); i++) {
        for (int j = 0; j < image.height(); j++) {
            for (int k = 0; k < (int)ref_color.size(); k++) {
                if (image(i, j, 0, 0) == ref_color[k].R && image(i, j, 0, 1) == ref_color[k].G && image(i, j, 0, 2) == ref_color[k].B) 
                    ref_color_num_thread_private[k]++;
            }
        }
    }
    #pragma omp critical 
    {   
        for(int i=0; i<(int)ref_color.size(); i++) {
             ref_color_num_thread[i] += ref_color_num_thread_private[i];
        }
    }
}

I went into a lot more detail about his here Fill histograms (array reduction) in parallel with OpenMP without using a critical section 我在这里不使用关键部分而在与OpenMP并行的情况下详细了解了他的填充直方图(数组缩减)

I showed how to an array reduction without a critical section but it's a lot more tricky. 我展示了如何在没有关键部分的情况下简化数组,但这要棘手得多。 You should test the first case and see if it works well for you first. 您应该测试第一种情况,然后首先查看它是否对您有效。 As long as the number of colors (ref_color.size()) is small compared to the number of pixels it should parallelize well. 只要颜色(ref_color.size())的数量比像素的数量少,它就应该很好地并行化。 Otherwise, you might need to try the second case without a critical section. 否则,您可能需要尝试第二种情况而没有关键部分。

There is a race condition if one of the outer two loops (i or j) are parallized, because the inner loop iteratates over the vector (k). 如果外部两个循环(i或j)之一是并行的,则存在竞争条件,因为内部循环在向量(k)上进行迭代。 I think your crash is because of that. 我认为您的崩溃是因为这一点。

You have to restructure your program. 您必须重组程序。 It is not trivial, but one idea is that each thread uses a local copy of the ref_color_num_thread vector. 这不是小事,但是一个想法是,每个线程都使用ref_color_num_thread向量的本地副本。 Once the computation is finished, you can sum up all the vectors. 一旦计算完成,就可以对所有向量求和。

If k is large enough to provide enough parallelism, you could exchange the loops. 如果k足够大以提供足够的并行度,则可以交换循环。 Instead of "i,j,k" you could iterate in the order "k,i,j". 代替“ i,j,k”,您可以按“ k,i,j”的顺序进行迭代。 If I'm not mistaken, there are no violated dependencies. 如果我没有记错的话,那么就没有违反的依赖关系。 Then you can parallelize the outer k loop, and let the inner i and j loops execute sequentially. 然后,您可以并行化外部k循环,并让内部i和j循环顺序执行。

Update: 更新:

pragma omp for also supports reductions, for example: pragma omp for也支持减少,例如:

#pragma omp for reduction(+ : nSum)

Here is a link to some documentation . 这里是一些文档的链接。

Maybe that can help you to restructure your program. 也许可以帮助您重组程序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM