简体   繁体   中英

OpenMP and 17 Nested For-Loops

I have a giant nested for-loop, designed to set a large array to its default value. I'm trying to use OpenMP for the first time to parallelize, and have no idea where to begin. I have been reading tutorials, and am afraid the process will be performed independently on N number of cores, instead of N cores divided the process amongst itself for a common output. The code is in C, compiled in Visual Studio v14. Any help for this newbie is appreciated -- thanks! (Attached below is the monster nested for-loop...)

    for (j = 0;j < box1; j++)
    {
        for (k = 0; k < box2; k++)
        {
            for (l = 0; l < box3; l++)
            {
                for (m = 0; m < box4; m++)
                {
                    for (x = 0;x < box5; x++)
                    {
                        for (y = 0; y < box6; y++)
                        {
                            for (xa = 0;xa < box7; xa++)
                            {
                                for (xb = 0; xb < box8; xb++)
                                {
                                    for (nb = 0; nb < memvara; nb++)
                                    {
                                        for (na = 0; na < memvarb; na++)
                                        {
                                            for (nx = 0; nx < memvarc; nx++)
                                            {
                                                for (nx1 = 0; nx1 < memvard; nx1++)
                                                {
                                                    for (naa = 0; naa < adirect; naa++)
                                                    {
                                                        for (nbb = 0; nbb < tdirect; nbb++)
                                                        {
                                                            for (ncc = 0; ncc < fs; ncc++)
                                                            {
                                                                for (ndd = 0; ndd < bs; ndd++)
                                                                {
                                                                    for (o = 0; o < outputnum; o++)
                                                                    {
                                                                        lookup->n[j][k][l][m][x][y][xa][xb][nb][na][nx][nx1][naa][nbb][ncc][ndd][o] = -3;     //set to default value

                                                                    }
                                                                }
                                                            }
                                                        }
                                                    }
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }

If n is actually a multidimensional array, you can do this:

size_t i;
size_t count = sizeof(lookup->n) / sizeof(int);
int *p = (int*)lookup->n;
for( i = 0; i < count; i++ )
{
    p[i] = -3;
}

Now, that's much easier to parallelize.

Read more on why this works here (applies to C as well): How do I use arrays in C++?

This is more of an extended comment than an answer.

Find the iteration limit (ie the variable among box1 , box2 , etc ) with the largest value. Revise your loop nest so that the outermost loop runs over that. Simply parallelise the outermost loop. Choosing the largest value means that you'll get, in the limit, an equal number of inner loop iterations to run for each thread.

Collapsing loops, whether you can use OpenMP's collapse clause or have to do it by hand, is only useful when you have reason to believe that parallelising over only the outermost loop will result in significant load imbalance. That seems very unlikely in this case, so distributing the work (approximately) evenly across the available threads at the outermost level would probably provide reasonably good load balancing.

I believe, based on tertiary research, that the solution might be found in adding #pragma omp parallel for collapse(N) directly above the nested loops. However, this seems to only work in OpenMP v3.0, and the whole project is based on Visual Studio (and therefore, OpenMP v2.0) for now...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM