相当于在CUDA中找到连续索引的OpenCl

Question

In CUDA to cover multiple blocks, and thus incerase the range of indices for arrays we do some thing like this: 在CUDA中，它涵盖了多个块，因此可以确定数组的索引范围，我们可以这样做：

Host side Code: 主机端代码：

 dim3 dimgrid(9,1)// total 9 blocks will be launched    
 dim3 dimBlock(16,1)// each block is having 16 threads  // total no. of threads in  
                   //   the grid is thus 16 x9= 144.

Device side code 设备端代码

 ...
 ...     
 idx=blockIdx.x*blockDim.x+threadIdx.x;// idx will range from 0 to 143 
 a[idx]=a[idx]*a[idx];
 ...
 ...

What is the equivalent in OpenCL for acheiving the above case ? OpenCL中实现上述情况的等效条件是什么？

Answer 1

On the host, when you enqueue your kernel using clEnqueueNDRangeKernel , you have to specify the global and local work size. 在主机上，使用clEnqueueNDRangeKernel将内核加入clEnqueueNDRangeKernel ，必须指定全局和本地工作大小。 For instance: 例如：

size_t global_work_size[1] = { 144 }; // 16 * 9 == 144
size_t local_work_size[1] = { 16 };
clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL,
                       global_work_size, local_work_size,
                       0, NULL, NULL);

In your kernel, use: 在您的内核中，使用：

size_t get_global_size(uint dim);
size_t get_global_id(uint dim);
size_t get_local_size(uint dim);
size_t get_local_id(uint dim);

to retrieve the global and local work sizes and indices respectively, where dim is 0 for x , 1 for y and 2 for z . 分别检索全局和局部工作量和索引，其中dim对于x是0 ，对于y是1 ，对于z 2 。

The equivalent of your idx will thus be simply size_t idx = get_global_id(0); 因此，您的idx的等效项就是size_t idx = get_global_id(0);

See the OpenCL Reference Pages . 请参阅OpenCL参考页。

Answer 2

Equivalences between CUDA and OpenCL are: CUDA和OpenCL之间的等效项是：

blockIdx.x*blockDim.x+threadIdx.x = get_global_id(0)

LocalSize = blockDim.x

GlobalSize = blockDim.x * gridDim.x

相当于在CUDA中找到连续索引的OpenCl

问题描述

2 个解决方案

解决方案1
4 已采纳 2012-05-02 15:25:47

解决方案2
1

相当于在CUDA中找到连续索引的OpenCl

问题描述

2 个解决方案

解决方案1 4 已采纳 2012-05-02 15:25:47

解决方案2 1

解决方案1
4 已采纳 2012-05-02 15:25:47

解决方案2
1