使用数组元素进行OpenMP并行化

Question

I've been playing around with OpenMP, and am trying to see if I can get a speedup in a particular bit of C++ code. 我一直在使用OpenMP，并且尝试查看是否可以在特定的C ++代码中获得加速。

    #pragma omp parallel for
    for (Index j=alignedSize; j<size; ++j)
    {
      res[j] = cj.pmadd(lhs0(j), pfirst(ptmp0), res[j]);
      res[j] = cj.pmadd(lhs1(j), pfirst(ptmp1), res[j]);
      res[j] = cj.pmadd(lhs2(j), pfirst(ptmp2), res[j]);
      res[j] = cj.pmadd(lhs3(j), pfirst(ptmp3), res[j]);
    }

I'm a complete newbie with OpenMP so be gentle with me, but could someone shed some light on why this code ends up doubling the execution time rather than speeding it up? 我是OpenMP的一个新手，所以请对我好一点，但是有人能说明为什么这段代码最终会使执行时间加倍而不是加快执行时间吗？

I'm running with 4 cores, just in case that matters. 我正在使用4核，以防万一。

Answer 1

What is the size of a res entry? res条目的大小是多少？ If its less than the size of a cache line then its likely false sharing . 如果它小于缓存行的大小，则可能是错误共享。

Answer 2

典型cpu的最低要求是128个字节的块，然后您需要统一的最后一级缓存。

使用数组元素进行OpenMP并行化

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-12-17 19:08:29

解决方案2
0 2016-12-17 23:51:40

使用数组元素进行OpenMP并行化

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-12-17 19:08:29

解决方案2 0 2016-12-17 23:51:40

解决方案1
2 已采纳 2016-12-17 19:08:29

解决方案2
0 2016-12-17 23:51:40