Following this link , I try to understand the operating of kernel code (there are 2 versions of this kernel code, one with volatile local float *source
and the other with volatile global float *source
, ie local
and global
versions). Below I take local
version :
float sum=0;
void atomic_add_local(volatile local float *source, const float operand) {
union {
unsigned int intVal;
float floatVal;
} newVal;
union {
unsigned int intVal;
float floatVal;
} prevVal;
do {
prevVal.floatVal = *source;
newVal.floatVal = prevVal.floatVal + operand;
} while (atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal);
}
If I understand well, each work-item shares the access to source
variable thanks to the qualifier " volatile
", doesn't it?
Afterwards, if I take a work-item, the code will add operand
value to newVal.floatVal
variable. Then, after this operation, I call atomic_cmpxchg
function which check if previous assignment ( preVal.floatVal = *source;
and newVal.floatVal = prevVal.floatVal + operand;
) has been done, ie by comparing the value stored at address source
with the preVal.intVal
.
During this atomic operation (which is not uninterruptible by definition), as value stored at source
is different from prevVal.intVal
, the new value stored at source
is newVal.intVal
, which is actually a float (because it is coded on 4 bytes like integer).
Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address
.
But for each work-item
thread, is there only one iteration into the while loop
?
I think there will be one iteration because the comparison " *source== prevVal.int ? newVal.intVal : newVal.intVal
" will always assign newVal.intVal
value to value stored at source address
, won't it?
Any help is welcome because I have not understood all the subtleties of this trick for this kernel code.
UPDATE 1 :
Sorry, I almost understand all the subtilities, especially in the while loop
:
First case : for a given single thread, before the call of atomic_cmpxchg, if prevVal.floatVal
is still equal to *source
, then atomic_cmpxchg
will change the value contained in source
pointer and return the value contained in old pointer
, which is equal to prevVal.intVal
, so we break from the while loop
.
Second case : If between the prevVal.floatVal = *source;
instruction and the call of atomic_cmpxchg
, the value *source
has changed (by another thread ??) then atomic_cmpxchg returns old
value which is no more equal to prevVal.floatVal
, so the condition into while loop
is true and we stay in this loop until previous condition isn't checked anymore.
My interpretation is correct ?
Thanks
If I understand well, each work-item shares the access to source variable thanks to the qualifier "
volatile
", doesn't it?
volatile
is a keyword of the C language that prevents the compiler from optimizing accesses to a specific location in memory (in other words, force a load/store at each read/write of said memory location). It has no impact on the ownership of the underlying storage. Here, it is used to force the compiler to re-read source
from memory at each loop iteration (otherwise the compiler would be allowed to move that load outside the loop, which breaks the algorithm).
do {
prevVal.floatVal = *source; // Force read, prevent hoisting outside loop.
newVal.floatVal = prevVal.floatVal + operand;
} while(atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal)
After removing qualifiers (for simplicity) and renaming parameters, the signature of atomic_cmpxchg
is the following:
int atomic_cmpxchg(int *ptr, int expected, int new)
What it does is:
atomically {
int old = *ptr;
if (old == expected) {
*ptr = new;
}
return old;
}
To summarize, each thread, individually, does:
*source
from memory into preVal.floatVal
*source
in newVal.floatVal
atomic_cmpxchg == newVal.intVal
, it means the compare-exchange was successful, break. Otherwise, the exchange didn't happen, go to 1 and try again. The above loop eventually terminates, because eventually , each thread succeeds in doing their atomic_cmpxchg
.
Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address.
Mutexes are locks, while this is a lock-free algorithm. OpenCL can simulate mutexes with spinlocks (also implemented with atomics) but this is not one.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.