了解浮动上减少OpenCL的方法

Question

Following this link , I try to understand the operating of kernel code (there are 2 versions of this kernel code, one with volatile local float *source and the other with volatile global float *source , ie local and global versions). 通过此链接，我尝试了解内核代码的操作（此内核代码有2个版本，一个版本具有volatile local float *source ，另一个版本具有volatile global float *source ，即local和global版本）。 Below I take local version : 下面我以local版本：

float sum=0;
void atomic_add_local(volatile local float *source, const float operand) {
    union {
        unsigned int intVal;
        float floatVal;
    } newVal;

    union {
        unsigned int intVal;
        float floatVal;
    } prevVal;

    do {
        prevVal.floatVal = *source;
        newVal.floatVal = prevVal.floatVal + operand;
    } while (atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal);
}

If I understand well, each work-item shares the access to source variable thanks to the qualifier " volatile ", doesn't it? 如果我理解得很好，由于限定符“ volatile ”，每个工作项都共享对source变量的访问，不是吗？

Afterwards, if I take a work-item, the code will add operand value to newVal.floatVal variable. 之后，如果我使用一个工作项，则代码会将operand数值添加到newVal.floatVal变量中。 Then, after this operation, I call atomic_cmpxchg function which check if previous assignment ( preVal.floatVal = *source; and newVal.floatVal = prevVal.floatVal + operand; ) has been done, ie by comparing the value stored at address source with the preVal.intVal . 然后，在执行此操作之后，我调用atomic_cmpxchg函数，该函数检查先前的赋值（ preVal.floatVal = *source;和newVal.floatVal = prevVal.floatVal + operand; ）是否已经完成，即通过将存储在地址source处的值与preVal.intVal 。

During this atomic operation (which is not uninterruptible by definition), as value stored at source is different from prevVal.intVal , the new value stored at source is newVal.intVal , which is actually a float (because it is coded on 4 bytes like integer). 在此原子操作期间（根据定义，它不是不可中断的），因为source存储的值不同于prevVal.intVal ，所以source存储的新值是newVal.intVal ，它实际上是一个浮点数（因为它被编码为4个字节，例如整数）。

Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address . 我们可以说每个工作项都具有对位于source address处的值的互斥访问（我是指锁定访问）。

But for each work-item thread, is there only one iteration into the while loop ? 但是对于each work-item线程， while loop是否只有一个迭代？

I think there will be one iteration because the comparison " *source== prevVal.int ? newVal.intVal : newVal.intVal " will always assign newVal.intVal value to value stored at source address , won't it? 我认为会有一次迭代，因为比较“ *source== prevVal.int ? newVal.intVal : newVal.intVal ”总是将newVal.intVal值分配给存储在source address值，不是吗？

Any help is welcome because I have not understood all the subtleties of this trick for this kernel code. 欢迎任何帮助，因为我还不了解此内核代码的所有技巧。

UPDATE 1 : 更新1：

Sorry, I almost understand all the subtilities, especially in the while loop : 抱歉，我几乎了解所有这些实用程序，尤其是在while loop ：

First case : for a given single thread, before the call of atomic_cmpxchg, if prevVal.floatVal is still equal to *source , then atomic_cmpxchg will change the value contained in source pointer and return the value contained in old pointer , which is equal to prevVal.intVal , so we break from the while loop . 第一种情况：对于给定的单线程，在调用atomic_cmpxchg之前，如果prevVal.floatVal仍等于*source ，那么atomic_cmpxchg将更改source指针中包含的值并返回old pointer包含的值，该值等于prevVal.intVal ，因此我们从while loop中断。

Second case : If between the prevVal.floatVal = *source; 第二种情况：如果在prevVal.floatVal = *source; instruction and the call of atomic_cmpxchg , the value *source has changed (by another thread ??) then atomic_cmpxchg returns old value which is no more equal to prevVal.floatVal , so the condition into while loop is true and we stay in this loop until previous condition isn't checked anymore. 指令和对atomic_cmpxchg的调用，值*source已更改（由另一个线程??），然后atomic_cmpxchg返回的old值不再等于prevVal.floatVal ，因此while loop的条件为true，我们一直待在此循环中，直到以前的条件不再检查。

My interpretation is correct ? 我的解释是正确的吗？

Thanks 谢谢

Answer 1

If I understand well, each work-item shares the access to source variable thanks to the qualifier " volatile ", doesn't it? 如果我理解得很好，由于限定符“ volatile ”，每个工作项都共享对源变量的访问，不是吗？

volatile is a keyword of the C language that prevents the compiler from optimizing accesses to a specific location in memory (in other words, force a load/store at each read/write of said memory location). volatile是C语言的关键字，可防止编译器优化对内存中特定位置的访问（换句话说，在该内存位置的每次读取/写入时强制进行加载/存储）。 It has no impact on the ownership of the underlying storage. 它对基础存储的所有权没有影响。 Here, it is used to force the compiler to re-read source from memory at each loop iteration (otherwise the compiler would be allowed to move that load outside the loop, which breaks the algorithm). 在这里，它用于强制编译器在每次循环迭代时从内存中重新读取source （否则编译器将被允许将负载移到循环外，这会破坏算法）。

do {
    prevVal.floatVal = *source; // Force read, prevent hoisting outside loop.
    newVal.floatVal = prevVal.floatVal + operand;
} while(atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal)

After removing qualifiers (for simplicity) and renaming parameters, the signature of atomic_cmpxchg is the following: 删除限定符（为简单起见）并重命名参数后， atomic_cmpxchg的签名如下：

int atomic_cmpxchg(int *ptr, int expected, int new)

What it does is: 它的作用是：

atomically {
    int old = *ptr;

    if (old == expected) {
        *ptr = new;
    }

    return old;
}

To summarize, each thread, individually, does: 总而言之，每个线程分别执行以下操作：

Load current value of *source from memory into preVal.floatVal 将*source当前值从内存加载到preVal.floatVal
Compute desired value of *source in newVal.floatVal 在newVal.floatVal计算*source的newVal.floatVal
Execute the atomic compare-exchange described above (using the type-punned values) 执行上述原子比较交换（使用类型化的值）
If the result of atomic_cmpxchg == newVal.intVal , it means the compare-exchange was successful, break. 如果atomic_cmpxchg == newVal.intVal的结果表示比较交换成功，则中断。 Otherwise, the exchange didn't happen, go to 1 and try again. 否则，交换不会发生，请转到1，然后重试。

The above loop eventually terminates, because eventually , each thread succeeds in doing their atomic_cmpxchg . 上面的循环最终终止，因为最终，每个线程都成功完成了atomic_cmpxchg 。

Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address. 我们可以说每个工作项都具有对位于源地址的值的互斥访问（我是指锁定访问）。

Mutexes are locks, while this is a lock-free algorithm. 互斥锁是锁，而这是一种无锁算法。 OpenCL can simulate mutexes with spinlocks (also implemented with atomics) but this is not one. OpenCL可以使用自旋锁（也可以通过原子实现）来模拟互斥锁，但这不是一个。

了解浮动上减少OpenCL的方法

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-01-31 06:21:37

了解浮动上减少OpenCL的方法

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-01-31 06:21:37

解决方案1
1 已采纳 2017-01-31 06:21:37