繁体   English   中英

并行块中的非并行for循环

[英]Non-parallel for loop in a parallel block

我有一个并行块,它产生一定数量的线程。 然后所有这些线程都应该启动一个“共享” for循环,其中包含多个并行的for循环。 例如这样的事情:

// 1. The parallel region spawns a number of threads.
#pragma omp parallel
{
    // 2. Each thread does something before it enters the loop below.
    doSomethingOnEachThreadAsPreparation();

    // 3. This loop should run by all threads synchronously; i belongs 
    // to all threads simultaneously
    // Basically there is only one variable i. When all threads reach this
    // loop i at first is set to zero.
    for (int i = 0; i < 100; i++)
    {
        // 4. Then each thread calls this function (this happens in parallel)
        doSomethingOnEachThreadAtTheStartOfEachIteration();

        // 5. Then all threads work on this for loop in parallel
        #pragma omp for
        for (int k = 0; i < 100000000; k++)
            doSomethingVeryTimeConsumingInParallel(k);
        // 6. After the parallel for loop there is (always) an implicit barrier 

        // 7. When all threads finished the for loop they call this method in parallel.
        doSomethingOnEachThreadAfterEachIteration();

        // 8. Here should be another barrier. Once every thread has finished
        // the call above, they jump back to the top of the for loop, 
        // where i is set to i + 1. If the condition for the loop
        // holds, continue at 4., otherwise go to 9. 
    }

    // 9. When the "non-parallel" loop has finished each thread continues.
    doSomethingMoreOnEachThread();
}

我认为使用#pragma omp single和共享的i变量可能已经可以实现这种类型的行为,但是我现在还不确定。

这些功能的实际作用无关紧要。 这与控制流程有关。 我添加了关于我希望它的样子的评论。 如果我正确理解的话,在3.处的循环通常会为每个线程创建一个i变量,并且循环头通常不会仅由单个线程执行。 但这就是我要的这种情况。

您可以在所有线程中运行for循环。 根据您的算法,可能需要在每个迭代之后(如下所示)或在所有迭代结束时进行同步。

#pragma omp parallel
{
  // enter parallel region
  doSomethingOnEachThreadAsPreparation();
    //done in // by all threads

  for (int i = 0; i < 100; i++)
    {
        doSomethingOnEachThreadAtTheStartOfEachIteration();
#       pragma omp for
        // parallelize the for loop
        for (int k = 0; i < 100000000; k++)
            doSomethingVeryTimeConsumingInParallel(k);
        // implicit barrier

        doSomethingOnEachThreadAfterEachIteration();
#       pragma omp barrier
        // Maybe a barrier is required, 
        // so that all iterations are synchronous
        // but if it is not required by the algorithm
        // performances will be better without the barrier
    }

    doSomethingMoreOnEachThread();
    // still in parallel
}

正如Zulan所指出的那样,除非使用嵌套并行机制,否则用omp single将main for循环omp single以在以后重新输入并行段是行不通的。 在这种情况下,线程将在每次迭代时重新创建,这将导致严重的速度下降。

omp_set_nested(1);
#pragma omp parallel
{
  // enter parallel region
  doSomethingOnEachThreadAsPreparation();
    //done in // by all threads

# pragma omp single
  // only one thread runs the loop
  for (int i = 0; i < 100; i++)
    {
#     pragma omp parallel
      {
        // create a new nested parallel section
        // new threads are created and this will 
        // certainly degrade performances
        doSomethingOnEachThreadAtTheStartOfEachIteration();
#       pragma omp for
        // and we parallelize the for loop
        for (int k = 0; i < 100000000; k++)
            doSomethingVeryTimeConsumingInParallel(k);
        // implicit barrier

        doSomethingOnEachThreadAfterEachIteration();
      }
      // we leave the parallel section (implicit barrier)
    }
    // we leave the single section

    doSomethingMoreOnEachThread();
    // and we continue running in parallel
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM