简体   繁体   English

使用数组元素进行OpenMP并行化

[英]OpenMP parallelization with array elements

I've been playing around with OpenMP, and am trying to see if I can get a speedup in a particular bit of C++ code. 我一直在使用OpenMP,并且尝试查看是否可以在特定的C ++代码中获得加速。

    #pragma omp parallel for
    for (Index j=alignedSize; j<size; ++j)
    {
      res[j] = cj.pmadd(lhs0(j), pfirst(ptmp0), res[j]);
      res[j] = cj.pmadd(lhs1(j), pfirst(ptmp1), res[j]);
      res[j] = cj.pmadd(lhs2(j), pfirst(ptmp2), res[j]);
      res[j] = cj.pmadd(lhs3(j), pfirst(ptmp3), res[j]);
    }

I'm a complete newbie with OpenMP so be gentle with me, but could someone shed some light on why this code ends up doubling the execution time rather than speeding it up? 我是OpenMP的一个新手,所以请对我好一点,但是有人能说明为什么这段代码最终会使执行时间加倍而不是加快执行时间吗?

I'm running with 4 cores, just in case that matters. 我正在使用4核,以防万一。

What is the size of a res entry? res条目的大小是多少? If its less than the size of a cache line then its likely false sharing . 如果它小于缓存行的大小,则可能是错误共享

典型cpu的最低要求是128个字节的块,然后您需要统一的最后一级缓存。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM