简体   繁体   English

OpenMP:Visual C ++ 2008和2010之间的巨大性能差异

[英]OpenMP: Huge performance differences between Visual C++ 2008 and 2010

I'm running a camera acquisition program that performs processing on acquired images, and I'm using simple OpenMP directives for this processing. 我正在运行一个摄像头采集程序,对采集的图像进行处理,我正在使用简单的OpenMP指令进行处理。 So basically I wait for an image from the camera, and then process it. 所以基本上我等待来自相机的图像,然后处理它。

When migrating to VC2010, I see very strange performance hog : under VC2010 my app is taking nearly 100% CPU while it is taking only 10% under VC2008. 迁移到VC2010时,我看到了非常奇怪的性能问题:在VC2010下,我的应用程序占用了近100%的CPU,而在VC2008下只占用了10%。

If I benchmark only the processing code I get no difference between VC2010 and VC2008, the difference occurs when using the acquisition functions. 如果我只对处理代码进行基准测试,那么VC2010和VC2008之间没有区别,使用采集功能会产生差异。

I have reduced the code needed to reproduce the problem to a simple loop that does the following: 我已经将重现问题所需的代码减少到一个简单的循环,执行以下操作:

  for (int i=0; i<1000; ++i)
  {
    GetImage(buffer);//wait for image
    Copy2Array(buffer, my_array);

    long long sum = 0;//do some simple OpenMP parallel loop
    #pragma omp parallel for reduction(+:sum)
    for (int j=0; j<size; ++j)
      sum += my_array[j];
  }

This loop eats 5% of CPU with 2008, and 70% with 2010. 这个循环占2008年CPU的5%,2010年占70%。

I've done some profiling, that shows that in 2010 most of the time is spent in OpenMP's vcomp100.dll!_vcomp::PartialBarrierN::Block 我做了一些分析,这表明在2010年大部分时间花在OpenMP的vcomp100.dll!_vcomp::PartialBarrierN::Block

I have also done some concurrency profiling: 我还做了一些并发性分析:

In 2008, processing work is distributed over 3 worker threads, that are very lightly active as processing time is much inferior as image waiting time 在2008年,处理工作分布在3个工作线程上,由于处理时间远远低于图像等待时间,所以它们非常活跃

The same threads appear in 2010, but they are all 100% occupied by the PartialBarrierN::Block function. 相同的线程出现在2010年,但它们都被PartialBarrierN::Block函数100%占用。 As I have four cores, they are eating 75% of the work, which is roughly what I see in the CPU occupation. 由于我有四个核心,他们正在吃75%的工作,这大致是我在CPU职业中看到的。

So it looks like there is a conflict between OpenMP and the Matrox acquisition library (proprietary). 所以看起来OpenMP和Matrox采集库(专有)之间存在冲突。 But is it a bug of VS2010 or Matrox? 但它是VS2010或Matrox的错误吗? Is there anything I can do? 有什么我能做的吗? Using VC++2010 is mandatory for me, so I cannot just stick with 2008. 使用VC ++ 2010对我来说是必须的,所以我不能坚持使用2008。

Big thanks 太谢谢了

STATUS UPDATE 状态更新

Using new concurrency framework, as suggested by DeadMG, leads to 40% CPU. 正如DeadMG所建议的那样,使用新的并发框架会导致40%的CPU。 Profiling it shows that time is spent in processing, so it doesn't show the bug I'm seeing with OpenMP, but performance in my case is way poorer than OpenMP. 分析它显示时间花在处理上,因此它没有显示我在OpenMP中看到的错误,但在我的情况下性能比OpenMP差。

STATUS UPDATE 2 状态更新2

I have installed an evaluation version of latest Intel C++. 我已经安装了最新英特尔C ++的评估版。 It shows exactly the same performance problems!! 它显示完全相同的性能问题!!

I cross-posted to MSDN forum 我交叉发布到MSDN论坛

STATUS UPDATE 3 状态更新3

Tested on Windows 7 64 bits and XP 32 bits, with the exact same results (on the same machinje) 在Windows 7 64位和XP 32位上测试,结果完全相同(在相同的机器上)

In 2010 OpenMP, each worker thread does a spin-wait of about 200 ms after task completion. 在2010 OpenMP中,每个工作线程在任务完成后执行大约200 ms的旋转等待。 In my case of a I/O wait and repetitive OpenMP task it is massively loading the CPU. 在我的I / O等待和重复的OpenMP任务的情况下,它正在大量加载CPU。

The solution is to change this behaviour; 解决方案是改变这种行为; Intel C++ has an extension routine for this, kmp_set_blocktime() . 英特尔C ++有一个扩展例程 ,即kmp_set_blocktime() However Visual 2010 doesn't have such possibility. 但是Visual 2010没有这种可能性。

In this Autodesk note they talks about the problem for Intel C++. 这篇Autodesk笔记中,他们讨论了英特尔C ++的问题。 This compiler first introduced the behavior, but allows to change it (see above). 此编译器首先引入了该行为,但允许更改它(参见上文)。 Visual 2010 switched to it, but... without the workaround like Intel. Visual 2010切换到它,但......没有像英特尔这样的解决方法。

So to sum it up, switching to Intel C++ and using kmp_set_blocktime(0) solved it. 总而言之,切换到英特尔C ++并使用kmp_set_blocktime(0)解决了它。

Thanks to John Lilley from DataLever Corporation on the other MSDN thread 感谢DataLever公司的 John Lilley在另一个MSDN线程上

Issue has been submitted to MS Connect , and received the "won't fix" feedback. 问题已提交给MS Connect ,并收到“无法修复”的反馈。

With OpenMP 3.0 the spinwait can be deactivated via OMP_WAIT_POLICY : 使用OpenMP 3.0,可以通过OMP_WAIT_POLICY停用spinwait:

_putenv_s( "OMP_WAIT_POLICY", "PASSIVE" );

The effect is basically the same as with kmp_set_blocktime(0) , but as we set the environment variable OMP_WAIT_POLICY during runtime, it'll only affect the current process and child processes. 效果与kmp_set_blocktime(0)基本相同,但是当我们在运行时设置环境变量OMP_WAIT_POLICY时,它只会影响当前进程和子进程。

Of course OMP_WAIT_POLICY can also be set by a launcher application, eg Blender handles it that way. 当然,OMP_WAIT_POLICY也可以由启动器应用程序设置,例如Blender以这种方式处理它。

A hotfix for VC2010 is available here , later versions like VC2013 support it directly. 此处提供VC2010的修补程序,VC2013等更高版本可直接支持。

You could try the new Concurrency Runtime that ships with VS2010- just starting on your test sample. 您可以尝试使用VS2010附带的新并发运行时 - 只需从测试样本开始。

That is, 那是,

for (int i=0; i<1000; ++i)
  {
    GetImage(buffer);//wait for image
    Copy2Array(buffer, my_array);

    long long sum = 0;//do some simple OpenMP parallel loop
    #pragma omp parallel for reduction(+:sum)
    for (int j=0; j<size; ++j)
      sum += my_array[j];
  }

would become 会成为

for (int i=0; i<1000; ++i)
  {
    GetImage(buffer);//wait for image
    Copy2Array(buffer, my_array);

    Concurrency::combinable<int> combint;
    Concurrency::parallel_for(0, size / 1000, [&](int j) {
      for(int i = 0; i < 1000; i++)
          combint.local() += my_array[(j * 1000) + i];
    });
    combint.combine([](int a, int b) { return a + b; });
  }

I tested another acquisition board, and the problem is identical, so the culprit is VC++2010. 我测试了另一个采集板,问题是相同的,所以罪魁祸首是VC ++ 2010。 Microsoft made OpenMP implementation changes that screws up programs like mine, as a thread on MSDN forums shows. 正如MSDN论坛上的一个帖子所示,微软做了一些OpenMP实现更改,搞砸了像我这样的程序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM