[英]Understanding the method for OpenCL reduction on float
Following this link , I try to understand the operating of kernel code (there are 2 versions of this kernel code, one with volatile local float *source
and the other with volatile global float *source
, ie local
and global
versions). 通过此链接 ,我尝试了解内核代码的操作(此内核代码有2个版本,一个版本具有
volatile local float *source
,另一个版本具有volatile global float *source
,即local
和global
版本)。 Below I take local
version : 下面我以
local
版本:
float sum=0;
void atomic_add_local(volatile local float *source, const float operand) {
union {
unsigned int intVal;
float floatVal;
} newVal;
union {
unsigned int intVal;
float floatVal;
} prevVal;
do {
prevVal.floatVal = *source;
newVal.floatVal = prevVal.floatVal + operand;
} while (atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal);
}
If I understand well, each work-item shares the access to source
variable thanks to the qualifier " volatile
", doesn't it? 如果我理解得很好,由于限定符“
volatile
”,每个工作项都共享对source
变量的访问,不是吗?
Afterwards, if I take a work-item, the code will add operand
value to newVal.floatVal
variable. 之后,如果我使用一个工作项,则代码会将
operand
数值添加到newVal.floatVal
变量中。 Then, after this operation, I call atomic_cmpxchg
function which check if previous assignment ( preVal.floatVal = *source;
and newVal.floatVal = prevVal.floatVal + operand;
) has been done, ie by comparing the value stored at address source
with the preVal.intVal
. 然后,在执行此操作之后,我调用
atomic_cmpxchg
函数,该函数检查先前的赋值( preVal.floatVal = *source;
和newVal.floatVal = prevVal.floatVal + operand;
)是否已经完成,即通过将存储在地址source
处的值与preVal.intVal
。
During this atomic operation (which is not uninterruptible by definition), as value stored at source
is different from prevVal.intVal
, the new value stored at source
is newVal.intVal
, which is actually a float (because it is coded on 4 bytes like integer). 在此原子操作期间(根据定义,它不是不可中断的),因为
source
存储的值不同于prevVal.intVal
,所以source
存储的新值是newVal.intVal
,它实际上是一个浮点数(因为它被编码为4个字节,例如整数)。
Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address
. 我们可以说每个工作项都具有对位于
source address
处的值的互斥访问(我是指锁定访问)。
But for each work-item
thread, is there only one iteration into the while loop
? 但是对于
each work-item
线程, while loop
是否只有一个迭代?
I think there will be one iteration because the comparison " *source== prevVal.int ? newVal.intVal : newVal.intVal
" will always assign newVal.intVal
value to value stored at source address
, won't it? 我认为会有一次迭代,因为比较“
*source== prevVal.int ? newVal.intVal : newVal.intVal
”总是将newVal.intVal
值分配给存储在source address
值,不是吗?
Any help is welcome because I have not understood all the subtleties of this trick for this kernel code. 欢迎任何帮助,因为我还不了解此内核代码的所有技巧。
UPDATE 1 : 更新1:
Sorry, I almost understand all the subtilities, especially in the while loop
: 抱歉,我几乎了解所有这些实用程序,尤其是在
while loop
:
First case : for a given single thread, before the call of atomic_cmpxchg, if prevVal.floatVal
is still equal to *source
, then atomic_cmpxchg
will change the value contained in source
pointer and return the value contained in old pointer
, which is equal to prevVal.intVal
, so we break from the while loop
. 第一种情况:对于给定的单线程,在调用atomic_cmpxchg之前,如果
prevVal.floatVal
仍等于*source
,那么atomic_cmpxchg
将更改source
指针中包含的值并返回old pointer
包含的值,该值等于prevVal.intVal
,因此我们从while loop
中断。
Second case : If between the prevVal.floatVal = *source;
第二种情况:如果在
prevVal.floatVal = *source;
instruction and the call of atomic_cmpxchg
, the value *source
has changed (by another thread ??) then atomic_cmpxchg returns old
value which is no more equal to prevVal.floatVal
, so the condition into while loop
is true and we stay in this loop until previous condition isn't checked anymore. 指令和对
atomic_cmpxchg
的调用,值*source
已更改(由另一个线程??),然后atomic_cmpxchg返回的old
值不再等于prevVal.floatVal
,因此while loop
的条件为true,我们一直待在此循环中,直到以前的条件不再检查。
My interpretation is correct ? 我的解释是正确的吗?
Thanks 谢谢
If I understand well, each work-item shares the access to source variable thanks to the qualifier "
volatile
", doesn't it?如果我理解得很好,由于限定符“
volatile
”,每个工作项都共享对源变量的访问,不是吗?
volatile
is a keyword of the C language that prevents the compiler from optimizing accesses to a specific location in memory (in other words, force a load/store at each read/write of said memory location). volatile
是C语言的关键字,可防止编译器优化对内存中特定位置的访问(换句话说,在该内存位置的每次读取/写入时强制进行加载/存储)。 It has no impact on the ownership of the underlying storage. 它对基础存储的所有权没有影响。 Here, it is used to force the compiler to re-read
source
from memory at each loop iteration (otherwise the compiler would be allowed to move that load outside the loop, which breaks the algorithm). 在这里,它用于强制编译器在每次循环迭代时从内存中重新读取
source
(否则编译器将被允许将负载移到循环外,这会破坏算法)。
do {
prevVal.floatVal = *source; // Force read, prevent hoisting outside loop.
newVal.floatVal = prevVal.floatVal + operand;
} while(atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal)
After removing qualifiers (for simplicity) and renaming parameters, the signature of atomic_cmpxchg
is the following: 删除限定符(为简单起见)并重命名参数后,
atomic_cmpxchg
的签名如下:
int atomic_cmpxchg(int *ptr, int expected, int new)
What it does is: 它的作用是:
atomically {
int old = *ptr;
if (old == expected) {
*ptr = new;
}
return old;
}
To summarize, each thread, individually, does: 总而言之,每个线程分别执行以下操作:
*source
from memory into preVal.floatVal
*source
当前值从内存加载到preVal.floatVal
*source
in newVal.floatVal
newVal.floatVal
计算*source
的newVal.floatVal
atomic_cmpxchg == newVal.intVal
, it means the compare-exchange was successful, break. atomic_cmpxchg == newVal.intVal
的结果表示比较交换成功,则中断。 Otherwise, the exchange didn't happen, go to 1 and try again. The above loop eventually terminates, because eventually , each thread succeeds in doing their atomic_cmpxchg
. 上面的循环最终终止,因为最终 ,每个线程都成功完成了
atomic_cmpxchg
。
Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address.
我们可以说每个工作项都具有对位于源地址的值的互斥访问(我是指锁定访问)。
Mutexes are locks, while this is a lock-free algorithm. 互斥锁是锁,而这是一种无锁算法。 OpenCL can simulate mutexes with spinlocks (also implemented with atomics) but this is not one.
OpenCL可以使用自旋锁(也可以通过原子实现)来模拟互斥锁,但这不是一个。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.