简体   繁体   English

OpenMP和17个嵌套循环

[英]OpenMP and 17 Nested For-Loops

I have a giant nested for-loop, designed to set a large array to its default value. 我有一个巨型嵌套的for循环,旨在将一个大型数组设置为其默认值。 I'm trying to use OpenMP for the first time to parallelize, and have no idea where to begin. 我试图第一次使用OpenMP进行并行化,却不知道从哪里开始。 I have been reading tutorials, and am afraid the process will be performed independently on N number of cores, instead of N cores divided the process amongst itself for a common output. 我一直在阅读教程,恐怕该过程将在N个内核上独立执行,而不是将N个内核划分为一个共同的输出,而不是N个内核。 The code is in C, compiled in Visual Studio v14. 该代码在Visual Studio v14中编译的C语言中。 Any help for this newbie is appreciated -- thanks! 感谢这个新手的任何帮助-谢谢! (Attached below is the monster nested for-loop...) (下面是怪物嵌套的循环...)

    for (j = 0;j < box1; j++)
    {
        for (k = 0; k < box2; k++)
        {
            for (l = 0; l < box3; l++)
            {
                for (m = 0; m < box4; m++)
                {
                    for (x = 0;x < box5; x++)
                    {
                        for (y = 0; y < box6; y++)
                        {
                            for (xa = 0;xa < box7; xa++)
                            {
                                for (xb = 0; xb < box8; xb++)
                                {
                                    for (nb = 0; nb < memvara; nb++)
                                    {
                                        for (na = 0; na < memvarb; na++)
                                        {
                                            for (nx = 0; nx < memvarc; nx++)
                                            {
                                                for (nx1 = 0; nx1 < memvard; nx1++)
                                                {
                                                    for (naa = 0; naa < adirect; naa++)
                                                    {
                                                        for (nbb = 0; nbb < tdirect; nbb++)
                                                        {
                                                            for (ncc = 0; ncc < fs; ncc++)
                                                            {
                                                                for (ndd = 0; ndd < bs; ndd++)
                                                                {
                                                                    for (o = 0; o < outputnum; o++)
                                                                    {
                                                                        lookup->n[j][k][l][m][x][y][xa][xb][nb][na][nx][nx1][naa][nbb][ncc][ndd][o] = -3;     //set to default value

                                                                    }
                                                                }
                                                            }
                                                        }
                                                    }
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }

If n is actually a multidimensional array, you can do this: 如果n实际上是多维数组,则可以执行以下操作:

size_t i;
size_t count = sizeof(lookup->n) / sizeof(int);
int *p = (int*)lookup->n;
for( i = 0; i < count; i++ )
{
    p[i] = -3;
}

Now, that's much easier to parallelize. 现在,并行化要容易得多。

Read more on why this works here (applies to C as well): How do I use arrays in C++? 阅读更多有关为什么它在这里起作用的信息(也适用于C): 如何在C ++中使用数组?

This is more of an extended comment than an answer. 这更多的是评论而不是答案。

Find the iteration limit (ie the variable among box1 , box2 , etc ) with the largest value. 找到最大的迭代极限(即box1box2 之间的变量)。 Revise your loop nest so that the outermost loop runs over that. 修改循环嵌套,以使最外面的循环在该循环上运行。 Simply parallelise the outermost loop. 只需并行化最外面的循环。 Choosing the largest value means that you'll get, in the limit, an equal number of inner loop iterations to run for each thread. 选择最大值意味着您将在最大程度上获得为每个线程运行的相等数量的内部循环迭代。

Collapsing loops, whether you can use OpenMP's collapse clause or have to do it by hand, is only useful when you have reason to believe that parallelising over only the outermost loop will result in significant load imbalance. 折叠循环(无论您可以使用OpenMP的collapse子句还是必须手动执行)仅在您有理由相信仅在最外部的循环上并行化会导致严重的负载不平衡时才有用。 That seems very unlikely in this case, so distributing the work (approximately) evenly across the available threads at the outermost level would probably provide reasonably good load balancing. 在这种情况下,这似乎不太可能,因此,在最外层的可用线程之间平均分配工作(大约)可能会提供合理的良好负载平衡。

I believe, based on tertiary research, that the solution might be found in adding #pragma omp parallel for collapse(N) directly above the nested loops. 我相信,基于第三级研究,可以在嵌套循环的正上方直接添加#pragma omp parallel for collapse(N) ,从而找到解决方案。 However, this seems to only work in OpenMP v3.0, and the whole project is based on Visual Studio (and therefore, OpenMP v2.0) for now... 但是,这似乎仅在OpenMP v3.0中有效,并且整个项目目前基于Visual Studio(因此基于OpenMP v2.0)...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM