简体   繁体   English

我应该把 ANNOTATE_ITERATION_TASK 放在哪里?

[英]Where should I put ANNOTATE_ITERATION_TASK?

I'm using Intel Advisor to analyze my parallel application.我正在使用 Intel Advisor 来分析我的并行应用程序。 I have this code, which is the main loop of my program and where is spent most of the time:我有这段代码,它是我程序的主循环,大部分时间都花在了哪里:

   for(size_t i=0; i<wrapperIndexes.size(); i++){
       const int r = wrapperIndexes[i].r;
       const int c = wrapperIndexes[i].c;
       const float val = localWrappers[wrapperIndexes[i].i].cur.at<float>(wrapperIndexes[i].r,wrapperIndexes[i].c);
       if ( (val > positiveThreshold && (isMax(val, localWrappers[wrapperIndexes[i].i].cur, r, c) && isMax(val, localWrappers[wrapperIndexes[i].i].low, r, c) && isMax(val, localWrappers[wrapperIndexes[i].i].high, r, c))) ||
            (val < negativeThreshold && (isMin(val, localWrappers[wrapperIndexes[i].i].cur, r, c) && isMin(val, localWrappers[wrapperIndexes[i].i].low, r, c) && isMin(val, localWrappers[wrapperIndexes[i].i].high, r, c))) )
          // either positive -> local max. or negative -> local min.
            ANNOTATE_ITERATION_TASK(localizeKeypoint);
            localizeKeypoint(r, c, localCurSigma[wrapperIndexes[i].i], localPixelDistances[wrapperIndexes[i].i], localWrappers[wrapperIndexes[i].i]);
   }

As you can see, localizeKeypoint is where most of the time the loop is spent (if you don't consider the if clause).如您所见, localizeKeypoint是循环花费大部分时间的地方(如果您不考虑if子句)。 I want to do a Suitability Report to estimate the gain from parallelizing the loop above.我想做一个适用性报告来估计并行化上述循环的收益。 So I've written this:所以我写了这个:

   ANNOTATE_SITE_BEGIN(solve);
   for(size_t i=0; i<wrapperIndexes.size(); i++){
       const int r = wrapperIndexes[i].r;
       const int c = wrapperIndexes[i].c;
       const float val = localWrappers[wrapperIndexes[i].i].cur.at<float>(wrapperIndexes[i].r,wrapperIndexes[i].c);
       if ( (val > positiveThreshold && (isMax(val, localWrappers[wrapperIndexes[i].i].cur, r, c) && isMax(val, localWrappers[wrapperIndexes[i].i].low, r, c) && isMax(val, localWrappers[wrapperIndexes[i].i].high, r, c))) ||
            (val < negativeThreshold && (isMin(val, localWrappers[wrapperIndexes[i].i].cur, r, c) && isMin(val, localWrappers[wrapperIndexes[i].i].low, r, c) && isMin(val, localWrappers[wrapperIndexes[i].i].high, r, c))) )
          // either positive -> local max. or negative -> local min.
            ANNOTATE_ITERATION_TASK(localizeKeypoint);
            localizeKeypoint(r, c, localCurSigma[wrapperIndexes[i].i], localPixelDistances[wrapperIndexes[i].i], localWrappers[wrapperIndexes[i].i]);
   }
   ANNOTATE_SITE_END();

And the Suitability Report given an excellent 6.69x gain, as you can see here:正如您在此处看到的,适用性报告提供了 6.69 倍的出色增益:

在此处输入图片说明

However, launching dependencies check, I got this problem message:但是,启动依赖项检查时,我收到了以下问题消息:

在此处输入图片说明

In particular see "Missing start task".特别是请参阅“缺少启动任务”。

In addition, if I place ANNOTATE_ITERATION_TASK at the beggining of the loop, like this:此外,如果我将ANNOTATE_ITERATION_TASK放在循环的开始处,如下所示:

   ANNOTATE_SITE_BEGIN(solve);
   for(size_t i=0; i<wrapperIndexes.size(); i++){
        ANNOTATE_ITERATION_TASK(localizeKeypoint);
       const int r = wrapperIndexes[i].r;
       const int c = wrapperIndexes[i].c;
       const float val = localWrappers[wrapperIndexes[i].i].cur.at<float>(wrapperIndexes[i].r,wrapperIndexes[i].c);
       if ( (val > positiveThreshold && (isMax(val, localWrappers[wrapperIndexes[i].i].cur, r, c) && isMax(val, localWrappers[wrapperIndexes[i].i].low, r, c) && isMax(val, localWrappers[wrapperIndexes[i].i].high, r, c))) ||
            (val < negativeThreshold && (isMin(val, localWrappers[wrapperIndexes[i].i].cur, r, c) && isMin(val, localWrappers[wrapperIndexes[i].i].low, r, c) && isMin(val, localWrappers[wrapperIndexes[i].i].high, r, c))) )
          // either positive -> local max. or negative -> local min.
            localizeKeypoint(r, c, localCurSigma[wrapperIndexes[i].i], localPixelDistances[wrapperIndexes[i].i], localWrappers[wrapperIndexes[i].i]);
   }
   ANNOTATE_SITE_END();

The gain is horrible:收益是可怕的:

在此处输入图片说明

Am I doing something wrong?难道我做错了什么?

INTEL_OPT=-O3 -simd -xCORE-AVX2 -parallel -qopenmp -fargument-noalias -ansi-alias -no-prec-div -fp-model fast=2
INTEL_PROFILE=-g -qopt-report=5 -Bdynamic -shared-intel -debug inline-debug-info -qopenmp-link dynamic -parallel-source-info=2 -ldl 

You have to use second approach, where you put ANNOTATE_ITERATION_TASK at the very beginning of loop annotation.您必须使用第二种方法,将 ANNOTATE_ITERATION_TASK 放在循环注释的开头。 Otherwise you get (a) wrong performance projection in Suitability, (b) Missing Start task in Correctness.否则,您会得到 (a) 适用性中的错误性能预测,(b) 正确性中缺少开始任务。

If you run Correctness for the second variant (where you put iteration task at the very beginning of loop body), then Correctness should be OK.如果您为第二个变体运行正确性(将迭代任务放在循环体的最开始),那么正确性应该没问题。

Your second Suitability chart is not horrible.您的第二个适用性图表并不可怕。 It just says that you have to take care about task chunking (click on the "chunking" link in the tool to learn more about it).它只是说您必须注意任务分块(单击工具中的“分块”链接以了解更多信息)。 Fortunately, in fresh OpenMP chunking is "good enough" by default, see https://software.intel.com/en-us/articles/openmp-loop-scheduling .幸运的是,默认情况下,新的 OpenMP 分块“足够好”,请参阅https://software.intel.com/en-us/articles/openmp-loop-scheduling So in order to see the Advisor projection with chunking ON, you just need to switch ON corresponding check-box and it will not be that bad.因此,为了看到带分块的 Advisor 投影,您只需要打开相应的复选框,它就不会那么糟糕。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM