简体   繁体   English

了解浮动上减少OpenCL的方法

[英]Understanding the method for OpenCL reduction on float

Following this link , I try to understand the operating of kernel code (there are 2 versions of this kernel code, one with volatile local float *source and the other with volatile global float *source , ie local and global versions). 通过此链接 ,我尝试了解内核代码的操作(此内核代码有2个版本,一个版本具有volatile local float *source ,另一个版本具有volatile global float *source ,即localglobal版本)。 Below I take local version : 下面我以local版本:

float sum=0;
void atomic_add_local(volatile local float *source, const float operand) {
    union {
        unsigned int intVal;
        float floatVal;
    } newVal;

    union {
        unsigned int intVal;
        float floatVal;
    } prevVal;

    do {
        prevVal.floatVal = *source;
        newVal.floatVal = prevVal.floatVal + operand;
    } while (atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal);
}

If I understand well, each work-item shares the access to source variable thanks to the qualifier " volatile ", doesn't it? 如果我理解得很好,由于限定符“ volatile ”,每个工作项都共享对source变量的访问,不是吗?

Afterwards, if I take a work-item, the code will add operand value to newVal.floatVal variable. 之后,如果我使用一个工作项,则代码会将operand数值添加到newVal.floatVal变量中。 Then, after this operation, I call atomic_cmpxchg function which check if previous assignment ( preVal.floatVal = *source; and newVal.floatVal = prevVal.floatVal + operand; ) has been done, ie by comparing the value stored at address source with the preVal.intVal . 然后,在执行此操作之后,我调用atomic_cmpxchg函数,该函数检查先前的赋值( preVal.floatVal = *source;newVal.floatVal = prevVal.floatVal + operand; )是否已经完成,即通过将存储在地址source处的值与preVal.intVal

During this atomic operation (which is not uninterruptible by definition), as value stored at source is different from prevVal.intVal , the new value stored at source is newVal.intVal , which is actually a float (because it is coded on 4 bytes like integer). 在此原子操作期间(根据定义,它不是不可中断的),因为source存储的值不同于prevVal.intVal ,所以source存储的新值是newVal.intVal ,它实际上是一个浮点数(因为它被编码为4个字节,例如整数)。

Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address . 我们可以说每个工作项都具有对位于source address处的值的互斥访问(我是指锁定访问)。

But for each work-item thread, is there only one iteration into the while loop ? 但是对于each work-item线程, while loop是否只有一个迭代?

I think there will be one iteration because the comparison " *source== prevVal.int ? newVal.intVal : newVal.intVal " will always assign newVal.intVal value to value stored at source address , won't it? 我认为会有一次迭代,因为比较“ *source== prevVal.int ? newVal.intVal : newVal.intVal ”总是将newVal.intVal值分配给存储在source address值,不是吗?

Any help is welcome because I have not understood all the subtleties of this trick for this kernel code. 欢迎任何帮助,因为我还不了解此内核代码的所有技巧。

UPDATE 1 : 更新1:

Sorry, I almost understand all the subtilities, especially in the while loop : 抱歉,我几乎了解所有这些实用程序,尤其是在while loop

First case : for a given single thread, before the call of atomic_cmpxchg, if prevVal.floatVal is still equal to *source , then atomic_cmpxchg will change the value contained in source pointer and return the value contained in old pointer , which is equal to prevVal.intVal , so we break from the while loop . 第一种情况:对于给定的单线程,在调用atomic_cmpxchg之前,如果prevVal.floatVal仍等于*source ,那么atomic_cmpxchg将更改source指针中包含的值并返回old pointer包含的值,该值等于prevVal.intVal ,因此我们从while loop中断。

Second case : If between the prevVal.floatVal = *source; 第二种情况:如果在prevVal.floatVal = *source; instruction and the call of atomic_cmpxchg , the value *source has changed (by another thread ??) then atomic_cmpxchg returns old value which is no more equal to prevVal.floatVal , so the condition into while loop is true and we stay in this loop until previous condition isn't checked anymore. 指令和对atomic_cmpxchg的调用,值*source已更改(由另一个线程??),然后atomic_cmpxchg返回的old值不再等于prevVal.floatVal ,因此while loop的条件为true,我们一直待在此循环中,直到以前的条件不再检查。

My interpretation is correct ? 我的解释是正确的吗?

Thanks 谢谢

If I understand well, each work-item shares the access to source variable thanks to the qualifier " volatile ", doesn't it? 如果我理解得很好,由于限定符“ volatile ”,每个工作项都共享对源变量的访问,不是吗?

volatile is a keyword of the C language that prevents the compiler from optimizing accesses to a specific location in memory (in other words, force a load/store at each read/write of said memory location). volatile是C语言的关键字,可防止编译器优化对内存中特定位置的访问(换句话说,在该内存位置的每次读取/写入时强制进行加载/存储)。 It has no impact on the ownership of the underlying storage. 它对基础存储的所有权没有影响。 Here, it is used to force the compiler to re-read source from memory at each loop iteration (otherwise the compiler would be allowed to move that load outside the loop, which breaks the algorithm). 在这里,它用于强制编译器在每次循环迭代时从内存中重新读取source (否则编译器将被允许将负载移到循环外,这会破坏算法)。

do {
    prevVal.floatVal = *source; // Force read, prevent hoisting outside loop.
    newVal.floatVal = prevVal.floatVal + operand;
} while(atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal)

After removing qualifiers (for simplicity) and renaming parameters, the signature of atomic_cmpxchg is the following: 删除限定符(为简单起见)并重命名参数后, atomic_cmpxchg的签名如下:

int atomic_cmpxchg(int *ptr, int expected, int new)

What it does is: 它的作用是:

atomically {
    int old = *ptr;

    if (old == expected) {
        *ptr = new;
    }

    return old;
}

To summarize, each thread, individually, does: 总而言之,每个线程分别执行以下操作:

  1. Load current value of *source from memory into preVal.floatVal *source当前值从内存加载到preVal.floatVal
  2. Compute desired value of *source in newVal.floatVal newVal.floatVal计算*sourcenewVal.floatVal
  3. Execute the atomic compare-exchange described above (using the type-punned values) 执行上述原子比较交换(使用类型化的值)
  4. If the result of atomic_cmpxchg == newVal.intVal , it means the compare-exchange was successful, break. 如果atomic_cmpxchg == newVal.intVal的结果表示比较交换成功,则中断。 Otherwise, the exchange didn't happen, go to 1 and try again. 否则,交换不会发生,请转到1,然后重试。

The above loop eventually terminates, because eventually , each thread succeeds in doing their atomic_cmpxchg . 上面的循环最终终止,因为最终 ,每个线程都成功完成了atomic_cmpxchg

Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address. 我们可以说每个工作项都具有对位于源地址的值的互斥访问(我是指锁定访问)。

Mutexes are locks, while this is a lock-free algorithm. 互斥锁是锁,而这是一种无锁算法。 OpenCL can simulate mutexes with spinlocks (also implemented with atomics) but this is not one. OpenCL可以使用自旋锁(也可以通过原子实现)来模拟互斥锁,但这不是一个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM