简体   繁体   English

omp global memory 围栏/屏障

[英]omp global memory fence / barrier

Does OpenMP with target offloading on the GPU include a global memory fence / global barrier, similar to OpenCL?在 GPU 上进行目标卸载的 OpenMP 是否包括类似于 OpenCL 的全局 memory 围栏/全局屏障?

barrier(CLK_GLOBAL_MEM_FENCE);

I've tried using inside a teams construct我尝试在团队构造中使用

#pragma omp target teams
{
    // Some initialization...

    #pragma omp distribute parallel for
    for (size_t i = 0; i < N; i += 1)
    {
        // Some work...
    }

    #pragma omp barrier

    #pragma omp distribute parallel for
    for (size_t i = 0; i < N; i += 1)
    {
        // Some other work depending on pervious loop
    }
}

However it seams that the barrier only works within a team, equivalent to:然而,屏障似乎只在团队内部起作用,相当于:

barrier(CLK_LOCAL_MEM_FENCE);

I would like to avoid splitting the kernel into two, to avoid sending team local data to global memory just to load it again.我想避免将 kernel 一分为二,以避免将团队本地数据发送到全局 memory 以再次加载它。

Edit: I've been able enforce the desired behavior using a global atomic counter and busy waiting of the teams.编辑:我已经能够使用全局原子计数器和团队的忙碌等待来强制执行所需的行为。 However this doesn't seem like a good solution, and I'm still wondering if there is a better way to do this using proper OpenMP然而,这似乎不是一个好的解决方案,我仍然想知道是否有更好的方法来使用正确的 OpenMP

A barrier construct only synchronizes threads in the current team. barrier构造仅同步当前团队中的线程。 Synchronization between threads from different thread teams launched by a teams construct is not available. teams构造启动的来自不同线程团队的线程之间的同步不可用。 OpenMP's execution model doesn't guarantee that such threads will even execute concurrently, so using atomic constructs to synchronize between the threads will not work in general: OpenMP 的执行 model 不能保证这些线程甚至会并发执行,因此使用atomic构造在线程之间进行同步通常不会起作用:

Whether the initial threads concurrently execute the teams region is unspecified, and a program that relies on their concurrent execution for the purposes of synchronization may deadlock.初始线程是否同时执行团队区域是未指定的,并且依赖于它们的并发执行来实现同步的程序可能会死锁。

Note that the OpenCL barrier call only provides synchronization within a workgroup, even with the CLK_GLOBAL_MEM_FENCE argument.请注意,OpenCL barrier调用仅在工作组内提供同步,即使使用CLK_GLOBAL_MEM_FENCE参数也是如此。 See Barriers in OpenCL for more information on semantics of CLK_GLOBAL_MEM_FENCE versus CLK_LOCAL_MEM_FENCE .有关CLK_GLOBAL_MEM_FENCECLK_LOCAL_MEM_FENCE语义的更多信息,请参阅OpenCL中的障碍。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM