简体   繁体   English

如何使用OpenCL内核来做累加器?

[英]How can I use OpenCL kernel to make accumulator?

    __kernel void cl_test(__global int* Number)
    {
       int id = get_global_id(0);
       if (id%5==0)
       {
           Number[0]++;
       }
       if (id%10==0)
       {
           Number[1]++;
       }
    }

As you can see, this is a very simple OpenCL kernel test code, what I want is to collect the number divisible by 5 and 10 in a range. 如您所见,这是一个非常简单的OpenCL内核测试代码,我想要的是收集在一个范围内被5和10整除的数字。

So here is the problem: since every work item's calculation is not pure parallel, the Number[0] or [1] in different items are related. 这就是问题所在:由于每个工作项的计算都不是完全并行的,因此不同项中的Number [0]或[1]是相关的。 I can't get the correct result by reading the Number[0] or Number[1]. 我无法通过读取Number [0]或Number [1]来获得正确的结果。

Is there any solution like the "global variable" in C++? 有没有像C ++中的“全局变量”这样的解决方案?

Thanks! 谢谢!

You need to use atomic operations. 您需要使用原子操作。

__kernel void cl_test(__global int* Number)
{
   int id = get_global_id(0);
   if (id%5==0)
   {
       atomic_inc(Number);
   }
   if (id%10==0)
   {
       atomic_inc(&Number[1]);
   }
}

You should avoid using those as much as possible as atomic operations tend to be rather slow precisely because they make sure that it works correctly across threads. 您应该避免尽可能多地使用原子操作,因为原子操作往往会相当慢,因为原子操作会确保原子操作在线程之间正确运行。

Atomic add will solve the summing problem 原子加法将解决求和问题

 __kernel void cl_test(__global int* Number)
    {
       int id = get_global_id(0);
       if (id%5==0)
       {
           atomic_add( Number, 1 );
       }
       if (id%10==0)
       {
           atomic_add( Number +1, 1 );
       }
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何提高此OpenCL精简内核代码的性能? - How can I improve the performance of this OpenCL Reduction Kernel code? 我如何使用矢量<char**> opencl c 内核上的缓冲区或使用此向量设置 SVM? - How could I use a vector<char**> buffer on opencl c kernel or set a SVM with this vector? Opengl和opencl在单个共享上下文中只能使用1个内核 - Opengl and opencl can use only 1 kernel in a single shared context 如何在OpenCL中最有效地映射Hermitian(对称)矩阵的内核范围? - How can I most efficiently map a kernel range for a hermitian (symmetric) matrix in OpenCL? 如何使用C ++包装器将偏移量应用于OpenCL 2.0中的内核参数 - How can I apply offsets to kernel arguments in OpenCL 2.0 with the c++ wrapper 如何在 class 成员初始化中使用升压累加器 quantile_probability? - How can I use boost accumulator quantile_probability inside a class member initialization? 如何将大数组复制到内存并在 OpenCL 内核中使用? - How to copy a big array to memory and use it in OpenCL kernel? 如何在Boost累加器中使用/访问用户参数? - How do I use/access a user argument in a boost accumulator? 如何让Doxygen知道CUDA内核调用? - How can I make Doxygen aware of CUDA kernel calls? Cuda-memcheck和JOCL,Java可执行文件可以使用吗? (OpenCL) - Cuda-memcheck and JOCL, can a java executable make use of it? (OpenCL)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM